CLI with Tarentula
Datashare Tarentula is a powerful command-line toolbelt designed to streamline bulk operations against any Datashare instance.
Whether you need to count indexed files, download large datasets, batch-tag records, or run complex Elasticsearch aggregations, Tarentula provides a consistent, scriptable interface with flexible query support, and Docker compatibility.
It also exposes a Python API for embedding automated workflows directly into your data pipelines.
With commands like count, download, aggregate, and tagging-by-query, you can handle millions of records in a single invocation, or integrate Tarentula into CI/CD pipelines for reproducible data tasks.
You can install Tarentula with your favorite package manager:
pip3 install --user tarentulaOr alternatively with Docker:
docker run icij/datashare-tarentulaFor the complete list of commands, options, and example, read the documentation or Github:
Last updated