arrow-left

All pages
gitbookPowered by GitBook
1 of 4

Loading...

Loading...

Loading...

Loading...

Add documents to Datashare

Datashare provides a folder to collect documents on your computer to index in Datashare.

1

hashtag
Add documents to your 'Datashare' folder

You can find a folder called 'Datashare' in your home directory.

Move the documents you want to add to Datashare into this folder.

2

hashtag
Launch Datashare

Launch Datashare and see the interface opening in your default browser.

3

hashtag
In the menu, in 'Tasks', open 'Documents'

Expand the menu on the left:

In 'Tasks', open 'Documents':

4

hashtag
Choose your options

  • Select the project in Datashare where you want to add your documents. The Default project, which is automatically created, is selected by default.

5

hashtag
Watch the progress of your document addition

Two extraction tasks are now running:

  • The first is the

You can now .

Install on Linux

These pages will help you set up and install Datashare on your computer.

Open the "Documents" page

On the top right, click the 'Plus' button:

Click the "Plus" button

Select the folder or sub-folder on your computer in your 'Datashare' directory containing the documents you want to add. The entire 'Datashare' directory will be added by default.

  • Choose the language of your documents if you don't want Datashare to guess it automatically. Note: If you choose to also extract text from images (at the next option), you might need to install the appropriate language package on your system. Datashare will tell you if the language package is missing. Refer to the documentation to know how to install language packages.

  • Extract text from images/PDFs with Optical Character Recognition (OCR). Be aware the indexing can take up to 10 times longer.

  • Skip already indexed documents if you'd like.

  • Click 'Add'

  • Form for adding documents
    scanning
    of your Datashare folder - it sees if there are documents to analyze. It is called 'ScanTask'.
  • The second is the indexing of these files. It is called 'IndexTask'.

  • Note: It is not possible to 'Find entities' while these two tasks are still running. You won't have the entities (names of people, organizations, locations and e-mail addresses) yet. To get these, once your document addition is finished, please follow the steps to 'Find entities'.

    But you can start searching in your documents without having to wait for all tasks to be done.

    search documents in Datashare
    Screenshot of Datashare's homepage highlighting the top icon in the left menu top to expand it
    Expand the menu

    Start Datashare

    Find the application on your computer and run it locally on your browser.

    Start Datashare by launching it from the command-line:

    datashare

    Datashare should now automatically open in your default internet browser. If it doesn’t, type 'localhost:8080arrow-up-right' in your browser.

    Datashare must be accessed from your internet browser (Firefox, Chome, etc), even though it works offline without Internet connection (see: Can I use Datashare with no internet connection?).

    Datashare's homepage

    It's now time to add documents to Datashare.

    Install Datashare

    Currently, only a .deb package for Debian/Ubuntu is provided.

    If you want to run it with another Linux distribution, you can download the latest version of the Datashare jar here: https://github.com/ICIJ/datashare/releases/latestarrow-up-right

    And adapt the following launch script to your environment: https://github.com/ICIJ/datashare/blob/master/datashare-dist/src/main/deb/bin/datasharearrow-up-right.

    1

    hashtag
    Download Datashare

    Go to and click 'Download for Linux':

    Save the Debian package as a file:

    2

    hashtag
    Install the package

    3

    hashtag
    Run Datashare

    You can now .

    Screenshot of Datashare's homepage with the left menu open highlighting the 'Documents' entry in the 'Tasks' category
    Screenshot of Datashare's Documents page highlighting the 'Plus' button at the top right corner
    Screenshot of Datashare's 'Add Documents' page with the form showing 5 options, a 'Reset' and an 'Add' buttons
    Screenshot of Datashare's Documents page highlighting two lines in a table, one for 'Scan folders' and another one for 'Index documents'
    Screenshot of the homepage of Datashare, the projects' page with one project called 'Default'
    datashare.icij.orgarrow-up-right
    start Datashare
    Save as file
    $ sudo apt install /dir/to/debian/package/datashare-dist_7.2.0_all.deb
    $ datashare
    Screenshot of the homepage of datashare.icij.org highlighting the 'Download for Linux' button
    Screenshot of a Linux' window saying 'What should Firefox do with this file?' with 2 radiobuttons: 'Open with Archive Manager' and "Save File' (selected) with 2 buttons: 'Cancel' and 'OK'