chris . makes . stuff

Researcher

Researcher is a search engine designed to be used in intelligence research situations, such as by law enforcement.

There are many pieces of software designed for this sphere, but they mostly have the same fundamental flaw which is that they prioritise "cool" features with limited real-world applications, while requiring an enormous amount of menial enabling work, such as data entry or data cleansing, from human beings.

Basically they think that xkcd #208 is real life

Unfortunately the problem is not usually writing the regex that finds the killer, but getting to the situation where you can actually run a regex across your intelligence.

Researcher is designed for real world situations where information is not necessarily structured in a nice way, or a predictable format. It tries to be as fast, simple and unintrusive as possible, in the spirit of a Unix cmdline utility.

It doesn't use a dedicated database, instead it works with arbitrary files organised on a normal shared network drive. This allows non-technical users to organise their data as their see fit, without being constrained to any particular schema.

It doesn't make any assumptions about the contents of the data it works with, but at the same time, files of a particular kind (such as internal forms and reports) can be parsed into structured data by defining custom parsers.

Researcher is built on Apache Tika and Apache Lucene and is very performant, even on comodity hardware. Millions of files and 100s of GB can be updated live, with searches taking ~100ms.

General purpose search engines focus a lot of their attention on sorting the relevance of results, as their users are generally making a small number of searches and getting too many results to read. In intelligence research this is not the case. The majority of searches are for specific identifiers, such as names, phone numbers, number plates etc, and searches very often don't actually return any results.

Because of this Researcher is designed to allow users to rapidly search a large number of queries (such as a long list of phone numbers), and quickly understand the results.

Researcher deploys as a webserver with a RESTful API that can be easily integrated with other tools such as Maltego.