Add documents to Datashare

Datashare provides a folder to collect documents on your computer to index in Datashare.

1

Add documents in 'Datashare Data' folder

When you open your desktop in Windows on your computer, you will see a folder called 'Datashare Data'.

Move or copy and paste the documents you want to add to Datashare to this folder:

Screenshot of Windows' homepage with the Datashare folder icon highlighted
2

Launch Datashare

You will find it in your main menu:

Screenshot of Windows' homepage with the menu open with the entry 'ICIJ' > 'Datashare 1.3' highlighted
3

In the menu, in 'Tasks', open 'Documents'

Expand the menu on the left:

Screenshot of Datashare's homepage highlighting the top icon in the left menu top to expand it
Expand the menu

In 'Tasks', open 'Documents':

Screenshot of Datashare's homepage with the left menu open highlighting the 'Documents' entry in the 'Tasks' category
Open the "Documents" page

On the top right, click the "Plus" button:

Screenshot of Datashare's Documents page highlighting the 'Plus' button at the top right corner
Click the "Plus" button

4

Choose your options

  • Select the project in Datashare where you want to add your documents. The Default project, which is automatically created, is selected by default.

  • Select the folder or sub-folder on your computer in your 'Datashare' directory containing the documents you want to add. The entire 'Datashare' directory will be added by default.

  • Choose the language of your documents if you don't want Datashare to guess it automatically. Note: If you choose to also extract text from images (at the next option), you might need to install the appropriate language package on your system. Datashare will tell you if the language package is missing. Refer to the documentation to know how to install language packages.

  • Extract text from images/PDFs with Optical Character Recognition (OCR). Be aware the indexing can take up to 10 times longer.

  • Skip already indexed documents if you'd like.

  • Click 'Add'

Screenshot of Datashare's 'Add Documents' page with the form showing 5 options, a 'Reset' and an 'Add' buttons
Form for adding documents

5

Watch the progress of your document addition

Two extraction tasks are now running:

  • The first is the scanning of your Datashare folder - it sees if there are documents to analyze. It is called 'ScanTask'.

  • The second is the indexing of these files. It is called 'IndexTask'.

Screenshot of Datashare's Documents page highlighting two lines in a table, one for 'Scan folders' and another one for 'Index documents'

Note: It is not possible to 'Find entities' while these two tasks are still running. You won't have the entities (names of people, organizations, locations and e-mail addresses) yet. To get these, once your document addition is finished, please follow the steps to 'Find entities'.

But you can start searching in your documents without having to wait for all tasks to be done.

You can now search documents in Datashare.

Last updated