Welcome to Datashare - a self-hosted documents search software. It is a free and open-source software developed by the International Consortium of Investigative Journalists (ICIJ). Initially created to combine multiple named-entity recognition pipelines, this tool is now a fully-featured search interface to dig into your documents. With the help of several open source tools (Extract, Apache Tika, Apache Tesseract, CoreNLP, OpenNLP, Elasticsearch, etc), Datashare can be used on one single personal computer as well as on 100 interconnected servers.
Who uses it?
Datashare is developed by the ICIJ, a collective of investigative journalists. Datashare is built at the top of technologies and methods already tested with investigations like the Panama Papers or the Luanda Leaks. Seeing the growing interest for ICIJ's technology, we decided to open source this key component of our investigations so a single journalist as well as big media organizations could use it on their own documents.
We setup a Demo instance of Datashare with a small set of documents from the Luxleaks investigation (2014). When using this instance, you will be assigned a temporary user which can star, tag, recommend and explore documents.
Can I run it on my server?
Datashare was also built to run on a server. This is how we use it for our collaborative projects. Please refer to the server documentation to know how it works.
Can I customize Datashare?
When building Datashare, one of our first decisions was to use Elasticsearch to create an index of documents. It would be fair to describe Datashare as a nice looking web interface for Elasticsearch. We want our search platform to be user-friendly while keeping all the powerful Elasticsearch features available for advanced users. This way we ensure that Datashare is usable by non tech-savvy reporters, but still robust enough to satisfy data analysts and developers who want to query the index directly with our API.
We implemented the possibility to create plugins, to make this process more accessible. Instead of modifying Datashare directly, you could isolate your code with a specific set of features and then configure Datashare to use it. Each Datashare user could pick the plugins they need or want, and have a fully customized installation of our search platform. Please have a look at the documentation.
This project is currently available in English, French, Spanish and Japanese. You can help us to improve and complete translations on Crowdin.