Datashare
DownloadAbout ICIJGitHub
  • About Datashare
  • Ask for help
  • Concepts
    • Running modes
    • CLI stages
  • About ICIJ
  • Github
  • 💻On your computer
    • About the local mode
    • Install on Mac
      • Start Datashare
      • Add documents to Datashare
    • Install on Windows
      • Start Datashare
      • Add documents to Datashare
    • Install on Linux
      • Start Datashare
      • Add documents to Datashare
    • Install with Docker
    • Add documents
    • Add more languages
    • Install plugins and extensions
    • Neo4j
      • Install Neo4j plugin
      • Create and update Neo4j graph
  • 🌐On your server
    • About the server mode
    • Install with Docker
    • Add documents from the CLI
    • Add entities from the CLI
    • Authentication providers
      • OAuth2
      • Basic with a database
      • Basic with Redis
      • Dummy
    • Neo4j
      • Install Neo4j plugin
      • Create and update Neo4j graph
    • Performance considerations
  • ⚡Usage
    • Search documents
    • Search documents in batch
    • Search with operators / Regex
    • Filter documents
    • Sort documents
    • Explore a document
    • Star documents
    • Tag documents
    • Recommend documents
    • Keyboard shortcuts
    • Create a Neo4j graph and explore it
    • FAQ
      • General
        • Can I use Datashare with no internet connection?
        • Can I download a document from Datashare?
        • Can I remove document(s) from Datashare?
        • Do you recommend OS or machines for large corpuses?
        • Can I use an external drive as data source?
        • How can we use Datashare on a collaborative mode on a server?
        • How can I contact ICIJ for help, bug reporting or suggestions?
        • Why results from a simple search and a batch search can be slightly different?
        • How can I uninstall Datashare?
        • Advanced: how can I do bulk actions with Tarentula?
        • What should I do if I get more than 10,000 results?
        • How to run Neo4j?
      • Definitions
        • What is a named entity?
        • What are NLP pipelines?
        • What is fuzziness?
        • What are proximity searches?
      • Common errors
        • 'Your search query is wrong.' What should I do?
        • Searching with double quotes doesn't work
        • List of common errors leading to "failure" in Batch Searches
        • What if Datashare says 'No documents found'?
        • Nothing works, everything crashes. What can I do?
        • What if tasks are 'running' but not completing?
        • 'You are not allowed to use Docker, you must be in the "docker-users" group'. What should I do?
        • What if a 'Preview' of my documents is 'not available'?
        • What do I do if Datashare opens a blank screen in my browser?
        • I see people, organizations and locations in the filters but not in the documents
        • What does 'Windows named pipe error' mean?
        • Datashare doesn't open. What should I do?
        • I upgraded to version 9 of Datashare and it fails.
  • 🤓Developers
    • How to contribute
    • Backend
      • API
      • API (deprecated)
      • Database
    • Frontend
      • JSDoc
      • Plugin hooks
      • Insight widgets
      • Vue app
        • Components
          • Api
          • AppliedSearchFilters
          • AppliedSearchFiltersItem
          • AppNav
          • AppSidebar
          • BatchDownloadActions
          • BatchSearchActions
          • BatchSearchClearFilters
          • BatchSearchCopyForm
          • BatchSearchFilterDate
          • BatchSearchFilterQuery
          • BatchSearchForm
          • BatchSearchResultsDetails
          • BatchSearchResultsFilters
          • BatchSearchResultsTable
          • BatchSearchStatus
          • BatchSearchTable
          • ColumnChartPicker
          • ColumnFilter
          • ColumnFilterBadge
          • ColumnFilterDropdown
          • ContentTypeBadge
          • Document
            • DocumentNavbar
            • DocumentNotes
            • DocumentTabDetails
            • DocumentTabExtractedText
            • DocumentTabNamedEntities
            • DocumentTabPreview
            • Viewers
              • AudioViewer
              • ImageViewer
              • JsonViewer
              • LegacySpreadsheetViewer
              • PaginatedViewer
              • SpreadsheetViewer
              • TiffViewer
              • VideoViewer
          • DocumentActions
          • DocumentAttachments
          • DocumentContent
          • DocumentContentSlice
          • DocumentContentSlicePlaceholder
          • DocumentContentSlices
          • DocumentGlobalSearchTermsTags
          • DocumentInModal
          • DocumentLocalSearchInput
          • DocumentSlicedName
          • DocumentTagsForm
          • DocumentThread
          • DocumentThumbnail
          • DocumentTranslatedContent
          • DocumentTypeCard
          • EllipseStatus
          • EmailString
          • Extensions
          • ExtractingForm
          • ExtractingFormOcrControl
          • ExtractingLanguageFormControl
          • Filter
            • FilterBoilerplate
            • FilterFooter
            • FilterSearch
            • FilterSortByDropdown
            • Types
              • FilterAbstract
              • FilterDate
              • FilterDateRange
              • FilterNamedEntity
              • FilterPath
              • FilterProject
              • FilterRecommendedBy
              • FilterStarred
              • FilterText
          • FiltersPanel
          • FindNamedEntitiesForm
          • Hook
          • InlineDirectoryPicker
          • JsonFormatter
          • LocalesMenu
          • MountedDataLocation
          • NamedEntityInContext
          • PageHeader
          • PageIcon
          • Pagination
          • Plugins
          • ProjectCards
          • ProjectForm
          • ProjectLink
          • ProjectSelector
          • ProjectThumbnail
          • QuickItemNav
          • ResetFiltersButton
          • RouterLinkPopup
          • ScrollTracker
          • SearchBar
          • SearchBarInput
          • SearchBarInputDropdown
          • SearchBarInputDropdownForField
          • SearchBarInputDropdownForProjects
          • SearchDocumentNavbar
          • SearchFormControl
          • SearchLayoutSelector
          • SearchResults
          • SearchResultsGrid
          • SearchResultsHeader
          • SearchResultsList
          • SearchResultsListLink
          • SearchResultsTable
          • ServerSettings
          • ShortkeysModal
          • TaskItemStatus
          • TasksList
          • TreeBreadcrumb
          • TreeView
          • UserDisplay
          • UserHistorySaveSearchForm
          • VersionNumber
          • Widget
            • WidgetDiskUsage
            • WidgetDocumentsByCreationDate
            • WidgetDocumentsByCreationDateByPath
            • WidgetDuplicates
            • WidgetEmpty
            • WidgetEntities
            • WidgetFieldFacets
            • WidgetFileBarometer
            • WidgetListGroup
            • WidgetNames
            • WidgetNested
            • WidgetProject
            • WidgetRecommendedBy
            • WidgetSearchBar
            • WidgetText
            • WidgetTreeMap
        • Pages
          • App
          • DocumentModal
          • DocumentStandalone
          • DocumentView
          • Error
          • Landing
          • Login
          • Project
          • ProjectList
          • ProjectNew
          • ProjectView
          • ProjectViewAddDocuments
          • ProjectViewEdit
          • ProjectViewFindNamedEntities
          • ProjectViewInsights
          • Search
          • Settings
          • TaskAnalysis
          • TaskAnalysisList
          • TaskBatchDownload
          • TaskBatchDownloadList
          • TaskBatchSearch
          • TaskBatchSearchList
          • TaskBatchSearchNew
          • TaskBatchSearchView
          • TaskBatchSearchViewResults
          • Tasks
          • UserHistory
          • UserHistoryDocumentList
          • UserHistorySavedSearchList
    • Introduction to Tarentula
    • Index operations with Playground
    • Write extensions
    • Write plugins
Powered by GitBook

Datashare is an open source project by the International Consortium of Investigative Journalists

On this page
  • Prerequisites
  • Starting Datashare with a single container
  • Starting Datashare with multiple containers
Export as PDF
  1. On your computer

Install with Docker

This page explain how to start Datashare within a Docker.

Last updated 6 months ago

Prerequisites

Datashare platform is designed to function effectively by utilizing a combination of various services. To streamline the development and deployment workflows, Datashare relies on the use of Docker containers. Docker provides a lightweight and efficient way to package and distribute software applications, making it easier to manage dependencies and ensure consistency across different environments.

Starting Datashare with a single container

To start Datashare within a container, you can use this command:

docker run --mount src=$HOME/Datashare,target=/home/datashare/data,type=bind -p 8080:8080 icij/datashare:11.1.9 --mode EMBEDDED

Make sure the Datashare folder exists in your homedir or this command will fail. This is an example about how to use Datashare with Docker, data will not be persisted.

Starting Datashare with multiple containers

Within Datashare, Docker Compose can play a significant role in enabling the setup of separated and persistent services for essential components such as the database (PostgreSQL), the search index (Elasticsearch), and the key-value store (Redis).

By utilizing Docker Compose, you can define and manage multiple containers as part of a unified service. This allows for seamless orchestration and deployment of interconnected services, each serving a specific purpose within the Datashare ecosystem.

Specifically, Docker Compose allows you to configure and launch separate containers for PostgreSQL, Elasticsearch, and Redis. These containers can be set up in a way that ensures their data is persistently stored, meaning that any information or changes made to the database, search index, or key-value store will be retained even if the containers are restarted or redeployed.

This separation of services using Docker Compose provides several advantages. It enhances modularity, scalability, and maintainability within the Datashare platform. It allows for independent management and scaling of each service, facilitating efficient resource utilization and enabling seamless upgrades or replacements of individual components as needed.

To start Datashare with , you can use the following docker-compose.yml file:

version: "3.7"
services:

  datashare:
    image: icij/datashare:18.1.3
    hostname: datashare
    ports:
      - 8080:8080
    environment:
      - DS_DOCKER_MOUNTED_DATA_DIR=/home/datashare/data
    volumes:
      - type: bind
        source: ${HOME}/Datashare
        target: /home/datashare/data
      - type: volume
        source: datashare-models
        target: /home/datashare/dist
    command: >-
      --dataSourceUrl jdbc:postgresql://postgresql/datashare?user=datashare\&password=password 
      --mode LOCAL
      --tcpListenPort 8080
    depends_on:
      - postgresql
      - redis
      - elasticsearch

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.9.1
    restart: on-failure
    volumes:
      - type: volume
        source: elasticsearch-data
        target: /usr/share/elasticsearch/data
        read_only: false
    environment:
      - "http.host=0.0.0.0"
      - "transport.host=0.0.0.0"
      - "cluster.name=datashare"
      - "discovery.type=single-node"
      - "discovery.zen.minimum_master_nodes=1"
      - "xpack.license.self_generated.type=basic"
      - "http.cors.enabled=true"
      - "http.cors.allow-origin=*"
      - "http.cors.allow-methods=OPTIONS, HEAD, GET, POST, PUT, DELETE"

  redis:
    image: redis:4.0.1-alpine
    restart: on-failure

  postgresql:
    image: postgres:12-alpine
    environment:
      - POSTGRES_USER=datashare
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=datashare
    volumes:
      - type: volume
        source: postgresql-data
        target: /var/lib/postgresql/data

volumes:
  datashare-models:
  elasticsearch-data:
  postgresql-data:

Open a terminal or command prompt and navigate to the directory where you saved the docker-compose.yml file. Then run the following command to start the Datashare service:

docker-compose up -d

The -d flag runs the containers in detached mode, allowing them to run in the background.

Docker Compose will pull the necessary Docker images (if not already present) and start the containers defined in the YAML file. Datashare will take a few seconds to start. You can check the progression of this opperation with:

docker-compose logs -f datashare

Once the containers are up and running, you can access the Datashare service by opening a web browser and entering http://localhost:8080. This assumes that the default port mapping of 8080:8080 is used for the Datashare container in the YAML file.

That's it! You should now have the Datashare service up and running, accessible through your web browser. Remember that the containers will continue to run until you explicitly stop them.

To stop the Datashare service and remove the containers, you can run the following command in the same directory where the docker-compose.yml file is located:

docker-compose down

This will stop and remove the containers, freeing up system resources.

💻
Read more about how to install Docker on your system.
Docker
Docker Compose