Add entities from the CLI
Last updated
Last updated
This document assumes you have installed Datashare and already .
In server , it's important to understand that Datashare does not provide an interface to add documents. As there is no build-in roles and permission in Datashare's data model, we have no way to differentiate user to offer admin additional tools.
This is likely to be changed in the near future, but in the meantime, you can extract named entities using the command-line interface.
Datashare as the ability to detect email addresses, name of people, organizations and locations. This process use a Natural Language Processing pipeline called CORENLP. Once your documents have been indexed in Datashare, you can perform the named entities extraction in the same fashion as the previous CLI's :
What's happening here:
Datashare starts in "CLI"
We ask to process the NLP
We tell Datashare to use the elasticsearch
service
Datashare will pull documents from ElasticSearch directly
Up to 2 documents will be analyzed in parallel
Datashare will use the CORENLP pipeline
Datashare will use the output queue from the previous INDEX
stage (by default extract:queue:nlp
in Redis) that contains all the document ids to be analyzed.
The first time you run this command you will have to wait a little bit because Datashare need to download CORENLP's models which can be big.
You can also use chain the 3 stages altogether:
The added ENQUEUEIDX
stage will read Elasticsearch index, find all documents that have not already been analyzed by the CORENLP NER pipeline, and put the ids of those documents into the extract:queue:nlp
queue.
As for the previous you may want to restore the output queue from the INDEX
stage. You can do: