# CLI with Tarentula

Whether you need to count indexed files, download large datasets, batch-tag records, or run complex Elasticsearch aggregations, Tarentula provides a consistent, scriptable interface with flexible query support, and Docker compatibility.

It also exposes a Python API for embedding automated workflows directly into your data pipelines.\
With commands like `count`, `download`, `aggregate`, and `tagging-by-query`, you can handle millions of records in a single invocation, or integrate Tarentula into CI/CD pipelines for reproducible data tasks.

You can install Tarentula with your favorite package manager:

```
pip3 install --user tarentula
```

Or alternatively with Docker:

```
docker run icij/datashare-tarentula
```

For the complete list of commands, options, and example, read the documentation or Github:

{% embed url="<https://github.com/ICIJ/datashare-tarentula>" %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://icij.gitbook.io/datashare/developers/introduction-to-tarentula.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
