API
Last updated
Last updated
Datashare is an open source project by the International Consortium of Investigative Journalists
Deletes batch searches and results for the current user.
no content: idempotent
Preflight request
returns 200 with DELETE
Deletes a batch search and its results with the given id. It won't delete running batch searches, because results would be orphans.
Returns 204 (No Content) : idempotent
Preflight request
returns 200 with DELETE
Retrieves the results of a batch search as an attached CSV file.
returns the results of the batch search as CSV attached file.
Preflight options request
returns 200 with OPTIONS and GET
Preflight request
returns OPTIONS and PUT
Preflight request
returns OPTIONS and DELETE
Preflight request for hide endpoint
returns PUT
Preflight request
returns OPTIONS and PUT
Preflight request
returns 200 with OPTIONS and DELETE
Uninstall plugin specified by its id.
returns 204 if the plugin is uninstalled (idempotent)
Deletes all user's projects from database and elasticsearch index.
if projects are deleted
Preflight option request
returns 200 with OPTIONS, POST, GET and DELETE
Preflight option request
returns 200 with OPTIONS and DELETE
Preflight request for batch download.
returns 200 with OPTIONS and POST
Preflight request for task cleaning.
returns OPTIONS and DELETE
Preflight request to stop tasks.
returns 200 with OPTIONS and PUT
Preflight request to stop all tasks.
returns 200 with OPTIONS and PUT
Preflight request for history
returns 200 with OPTIONS, GET, PUT and DELETE
Preflight request for history
returns OPTIONS and DELETE
Preflight for index creation.
returns 200 with PUT
Head request useful for JavaScript API (for example to test if an index exists)
returns 200
Deletes the project from database and elasticsearch index.
if project is deleted
Returns 200 if the project is allowed with this network route : in Datashare database there is the project table that can specify an IP mask that is allowed per project. If the client IP is not in the range, then the file download will be forbidden. In that project table there is a field called allow_from_mask
that can have a mask with IP and star wildcard.
Ex : 192.168.*.*
will match all subnetwork 192.168.0.0
IP's and only users with an IP in.
if project download is allowed for this project and IP
Gets the public (i.e. without user's information) datashare settings parameters.
These parameters are used for the client app for the init process.
The endpoint is removing all fields that contain Address or Secret or Url or Key
returns the list of public settings
Gets the user's session information.
returns the user map
Gets the versions (front/back/docker) of datashare.
returns the list of versions of datashare
Cancels the running tasks. It returns a map with task name/stop statuses.
If the status is false, it means that some threads have not been stopped.
returns 200 and the tasks stop result map
Gets the project information for the given id
if the project is not found in database
Deletes an apikey for current user. Only available in SERVER mode.
user identifier
when key has been deleted
Uninstall extension specified by its id.
returns 204 if the extension is uninstalled (idempotent)
Gets the list of registered pipelines.
returns the pipeline set
Get the JSON or YAML OpenAPI v3 contract specification
returns the JSON or YAML file
Delete user event by id.
Returns 204 (No Content) : idempotent
Preflight for key management
user identifier
returns OPTIONS, GET, PUT and DELETE
Retrieves the batch search queries with the given batch id and returns a list of strings UTF-8 encoded
identifier of the batch search
the batch search queries map [(query, nbResults), ...]
Preflight request for document tagging
the project id
document id
returns 200 with PUT
Preflight request for document untagging
the project id
document id
returns 200 with PUT
Hide all named entities with the given normalized mention
current project
normalized mention
returns 200 OK
Gets the list of notes for a project.
the project id
if the user is not granted for the project
Preflight request
returns POST
Get the private key for an existing user. Only available in SERVER mode.
user identifier
returns the hashed key JSON
Creates a new private key and saves its SHA384 hash into database for current user. Only available in SERVER mode.
user identifier
returns the api key JSON
Create the index for the current user if it doesn't exist.
index to create
returns 200 if the index already exists
Search GET request to Elasticsearch. As it is a GET method, all paths are accepted.
if a body is provided, the body will be sent to ES as source=urlencoded(body)&source_content_type=application%2FjsonIn that case, request parameters are not taken into account.
elasticsearch path
returns 200
Creates a project
if project and index have been created
Cleans a specific task.
name of the task to delete
returns 200 if the task is removed
Cancels the task with the given name.
name of the task to cancel
returns 200 with the cancellation status (true/false)
Retrieves the set of recommended documents for the given project id and a list of users
default response
Retrieves the list of starred documents for a given project.
the project id
default response
Retrieves the list of tagged documents for a given project id filtered by a given string of coma-separated list of tags.
the project id
comma separated tags
default response
Download (if necessary) and install extension specified by its id or url.Request parameter id
or url
must be present.
returns 200 if the extension is installed
Download (if necessary) and install plugin specified by its id or url.Request parameter id
or url
must be present.
returns 200 if the plugin is installed
Delete user history by type.
Returns 204 (No Content) : idempotent
Gets all user's document recommendations.
returns the user's document recommendations
Gets task result with its id
task id
returns 200 and the result
Gets all users who recommended a document with the count of all recommended documents for project and documents ids.
default response
Indexes files in a directory (with docker, it is the mounted directory that is scanned).
wrapper for options json
returns 200 and the list of tasks created
Download files from a search query.
Expected parameters are:
If the query is a string it is taken as an ES query string, else it is a raw JSON query (without the query part), see org.elasticsearch.index.query.WrapperQueryBuilder that is used to wrap the query.
the json used to wrap the query
returns 200 and the json task id
Creates a new batch search based on a previous one given its id, and enqueue it for running
source batch id
batch parameters
returns the id of the created batch search
Retrieves the list of users who recommended a document with the total count of recommended documents for the given project id
default response
Add or update an event to user's history. The event's type, the project ids and the uri are passed in the request body.
To update the event's name, the eventId is required to retrieve the corresponding event. The project list related to the event is stored in database but is never queried (no filters on project).
returns 200 when event is added or updated.
Gets tags by document id
the project id
document id
default response
Creates a new batch search. This is a multipart form with 9 fields:
name, description, csvFile, published, fileTypes, paths, fuzziness, phrase_matches, query_template.
Queries with less than two characters are filtered.
To make a request manually, you can create a file like:
--BOUNDARY
Content-Disposition: form-data; name="name"
my batch search
--BOUNDARY
Content-Disposition: form-data; name="description"
search description
--BOUNDARY
Content-Disposition: form-data; name="csvFile"; filename="search.csv"
Content-Type: text/csv
Obama
skype
test
query three
--BOUNDARY--
Content-Disposition: form-data; name="published"
true
--BOUNDARY--
Then curl with
curl -i -XPOST localhost:8080/api/batch/search/prj1,prj2 -H 'Content-Type: multipart/form-data; boundary=BOUNDARY' --data-binary @/home/dev/multipart.txt
you'll maybe have to replace \n with \r\n with sed -i 's/$/^M/g' ~/multipart.txt
Coma-separated list of projects
multipart form
if either name or CSV file is missing
Gets the extension set in JSON. If a request parameter "filter" is provided, the regular expression will be applied to the list.
See https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html for pattern syntax.
returns the extensions set
Gets the plugins set in JSON.
If a request parameter "filter" is provided, the regular expression will be applied to the list.
See Pattern for pattern syntax.
returns the plugins set
Retrieve the status of databus connection, database connection and index.
returns the status of datashare elements
Gets all the user tasks.
Filters can be added with name=value
. For example if name=foo
is given in the request url query,
the tasks containing the term "foo" are going to be returned. It can contain also dotted keys for nested properties matching.
For example if args.dataDir=bar
is provided, tasks with an argument "dataDir" containing "bar" are going to be selected.
Pagination/order parameters can be added:
returns the list of tasks
Indexes files from the queue.
wrapper for options json
returns 200 and the json task id
Lists all files and directory for the given path. This endpoint returns a JSON using the same specification than the tree
command on UNIX. It is roughly the equivalent of:
tree -L 1 -spJ --noreport /home/datashare/data
directory path in the tree
returns the list of files and directory
Gets one task with its id.
task id
returns the task from its id
When datashare is launched in NER mode (without index) it exposes a name finding HTTP API.
The text is sent with the HTTP body.
pipeline to use
returns the list of NamedEntities annotations
Cleans all DONE tasks.
returns 200 and the list of removed tasks
Retrieves the batch search list for the user issuing the request filter with the given criteria, and the total of batch searches matching the criteria.
If from/size are not given their default values are 0, meaning that all the results are returned. BatchDate must be a list of 2 items (the first one for the starting date and the second one for the ending date) If defined publishState is a string equals to "0" or "1"
the json webQuery request body
the list of batch searches with the total batch searches for the query
Retrieves the list of batch searches
default response
Retrieves the batch search with the given id. The query param "withQueries" accepts a boolean value.When "withQueries" is set to false, the list of queries is empty and nbQueries contains the number of queries.
Create a task with JSON body
task id
the task creation body
the task was already existing
Get all user's projects
default response
Get the FtM document from its project and id (content hash)
project identifier
document identifier
returns the JSON document
Retrieves the list of starred document for all projects for the current user.
default response