Analyze documents
It will help you index and have your documents in Datashare. This step is required in order to explore your documents.

Extract text

1. To have your documents in Datashare, click 'Tasks' in the left menu:
2. Click 'Analyze your documents':
3. Click 'Extract text' so that Datashare can extract the texts from your files:
If you want to extract text also from images and PDFs, tick the first toggle button but be aware that it can take up to 10 times longer (you will always be able to do it later):
Two extraction tasks are now running: the first is the scanning of your Datashare folder which sees if there are new documents to analyze (ScanTask). The second is the indexing of these files (IndexTask):
It is not possible to 'Find people, organizations and locations' as long as one of these two tasks is still running.
When tasks are done, you can start exploring documents by clicking 'Search' but you won't have the named entities (names of people, organizations and locations) yet. To have these, follow the steps below.

Extract names of people, organizations and locations

1. After the text is extracted, you can launch named entities recognition by clicking the button 'Find people, organizations and locations'.
2. In the window below, you are asked to choose between finding Named Entities or finding email addresses (you cannot do both simultaneously, you need to do one after the other, no matter the order):
You can now see running tasks and their progress. After they are done, you can click 'Clear done tasks' to stop displaying tasks that are completed.
3. You can search your indexed documents without having to wait for all tasks to be done. To access your documents, click 'Search':

Extract email addresses

To extract email addresses in your documents:
  • Re-click on 'Find people, organizations, locations and email addresses' (in Tasks (left menu) > Analyze your documents)
  • Click the second radio button 'Find email addresses':
You can now search documents.
Export as PDF
Copy link
On this page
Extract text
Extract names of people, organizations and locations
Extract email addresses