Add documents

It will help you index and have your documents in Datashare. This step is required in order to explore your documents.

Add documents

1. To add your documents in Datashare, click 'Tasks' in the left menu:

2. Click 'Analyze your documents':

3. Click 'Add documents' so Datashare can extract the texts from your files:

Options when adding documents

You can:

  • Select the specific folder or sub-folder containing the documents you want to add.

  • Extract text also from images/PDFs (OCR). Be aware the indexing can be up to 10 times longer.

  • Select the language of you document if you don't want Datashare to guess it automatically. Note: if you choose to also extract text from images (previous option), you might need to install the appropriate language package on your system. Datashare will tell you if the language package is missing. Refer to the documentation to know how to install language packages.

  • Skip already indexed files.

Two extraction tasks are now running: the first is the scanning of your Datashare folder which sees if there are new documents to analyze (ScanTask). The second is the indexing of these files (IndexTask):

It is not possible to 'Find people, organizations and locations' while of these two tasks is still running.

When tasks are done, you can start exploring documents by clicking 'Search' in the left menu but you won't have the named entities (names of people, organizations and locations) yet. To have these, follow the steps below.

Extract names of people, organizations and locations

1. After the text is extracted, you can launch named entities recognition by clicking the button 'Find people, organizations and locations'.

2. In the window below, you are asked to choose between finding Named Entities or finding email addresses (you cannot do both simultaneously, you need to do one after the other, no matter the order):

You can now see running tasks and their progress. After they are done, you can click 'Clear done tasks' to stop displaying tasks that are completed.

3. You can search your indexed documents without having to wait for all tasks to be done. To access your documents, click 'Search':

Extract email addresses

To extract email addresses in your documents:

  • Re-click on 'Find people, organizations, locations and email addresses' (in Tasks (left menu) > Analyze your documents)

  • Click the second radio button 'Find email addresses':

You can now search documents.

Last updated

Datashare is an open source project by the International Consortium of Investigative Journalists