Datashare
DownloadAbout ICIJGitHub
  • About Datashare
  • Ask for help
  • Concepts
    • Running modes
    • CLI stages
  • About ICIJ
  • Github
  • 💻On your computer
    • About the local mode
    • Install on Mac
      • Start Datashare
      • Add documents to Datashare
    • Install on Windows
      • Start Datashare
      • Add documents to Datashare
    • Install on Linux
      • Start Datashare
      • Add documents to Datashare
    • Install with Docker
    • Add documents
    • Add more languages
    • Install plugins and extensions
    • Neo4j
      • Install Neo4j plugin
      • Create and update Neo4j graph
  • 🌐On your server
    • About the server mode
    • Install with Docker
    • Add documents from the CLI
    • Add entities from the CLI
    • Authentication providers
      • OAuth2
      • Basic with a database
      • Basic with Redis
      • Dummy
    • Neo4j
      • Install Neo4j plugin
      • Create and update Neo4j graph
    • Performance considerations
  • ⚡Usage
    • Search documents
    • Search documents in batch
    • Search with operators / Regex
    • Filter documents
    • Sort documents
    • Explore a document
    • Star documents
    • Tag documents
    • Recommend documents
    • Keyboard shortcuts
    • Create a Neo4j graph and explore it
    • FAQ
      • General
        • Can I use Datashare with no internet connection?
        • Can I download a document from Datashare?
        • Can I remove document(s) from Datashare?
        • Do you recommend OS or machines for large corpuses?
        • Can I use an external drive as data source?
        • How can we use Datashare on a collaborative mode on a server?
        • How can I contact ICIJ for help, bug reporting or suggestions?
        • Why results from a simple search and a batch search can be slightly different?
        • How can I uninstall Datashare?
        • Advanced: how can I do bulk actions with Tarentula?
        • What should I do if I get more than 10,000 results?
        • How to run Neo4j?
      • Definitions
        • What is a named entity?
        • What are NLP pipelines?
        • What is fuzziness?
        • What are proximity searches?
      • Common errors
        • 'Your search query is wrong.' What should I do?
        • Searching with double quotes doesn't work
        • List of common errors leading to "failure" in Batch Searches
        • What if Datashare says 'No documents found'?
        • Nothing works, everything crashes. What can I do?
        • What if tasks are 'running' but not completing?
        • 'You are not allowed to use Docker, you must be in the "docker-users" group'. What should I do?
        • What if a 'Preview' of my documents is 'not available'?
        • What do I do if Datashare opens a blank screen in my browser?
        • I see people, organizations and locations in the filters but not in the documents
        • What does 'Windows named pipe error' mean?
        • Datashare doesn't open. What should I do?
        • I upgraded to version 9 of Datashare and it fails.
  • 🤓Developers
    • How to contribute
    • Backend
      • API
      • API (deprecated)
      • Database
    • Frontend
      • JSDoc
      • Plugin hooks
      • Insight widgets
      • Vue app
        • Components
          • Api
          • AppliedSearchFilters
          • AppliedSearchFiltersItem
          • AppNav
          • AppSidebar
          • BatchDownloadActions
          • BatchSearchActions
          • BatchSearchClearFilters
          • BatchSearchCopyForm
          • BatchSearchFilterDate
          • BatchSearchFilterQuery
          • BatchSearchForm
          • BatchSearchResultsDetails
          • BatchSearchResultsFilters
          • BatchSearchResultsTable
          • BatchSearchStatus
          • BatchSearchTable
          • ColumnChartPicker
          • ColumnFilter
          • ColumnFilterBadge
          • ColumnFilterDropdown
          • ContentTypeBadge
          • Document
            • DocumentNavbar
            • DocumentNotes
            • DocumentTabDetails
            • DocumentTabExtractedText
            • DocumentTabNamedEntities
            • DocumentTabPreview
            • Viewers
              • AudioViewer
              • ImageViewer
              • JsonViewer
              • LegacySpreadsheetViewer
              • PaginatedViewer
              • SpreadsheetViewer
              • TiffViewer
              • VideoViewer
          • DocumentActions
          • DocumentAttachments
          • DocumentContent
          • DocumentContentSlice
          • DocumentContentSlicePlaceholder
          • DocumentContentSlices
          • DocumentGlobalSearchTermsTags
          • DocumentInModal
          • DocumentLocalSearchInput
          • DocumentSlicedName
          • DocumentTagsForm
          • DocumentThread
          • DocumentThumbnail
          • DocumentTranslatedContent
          • DocumentTypeCard
          • EllipseStatus
          • EmailString
          • Extensions
          • ExtractingForm
          • ExtractingFormOcrControl
          • ExtractingLanguageFormControl
          • Filter
            • FilterBoilerplate
            • FilterFooter
            • FilterSearch
            • FilterSortByDropdown
            • Types
              • FilterAbstract
              • FilterDate
              • FilterDateRange
              • FilterNamedEntity
              • FilterPath
              • FilterProject
              • FilterRecommendedBy
              • FilterStarred
              • FilterText
          • FiltersPanel
          • FindNamedEntitiesForm
          • Hook
          • InlineDirectoryPicker
          • JsonFormatter
          • LocalesMenu
          • MountedDataLocation
          • NamedEntityInContext
          • PageHeader
          • PageIcon
          • Pagination
          • Plugins
          • ProjectCards
          • ProjectForm
          • ProjectLink
          • ProjectSelector
          • ProjectThumbnail
          • QuickItemNav
          • ResetFiltersButton
          • RouterLinkPopup
          • ScrollTracker
          • SearchBar
          • SearchBarInput
          • SearchBarInputDropdown
          • SearchBarInputDropdownForField
          • SearchBarInputDropdownForProjects
          • SearchDocumentNavbar
          • SearchFormControl
          • SearchLayoutSelector
          • SearchResults
          • SearchResultsGrid
          • SearchResultsHeader
          • SearchResultsList
          • SearchResultsListLink
          • SearchResultsTable
          • ServerSettings
          • ShortkeysModal
          • TaskItemStatus
          • TasksList
          • TreeBreadcrumb
          • TreeView
          • UserDisplay
          • UserHistorySaveSearchForm
          • VersionNumber
          • Widget
            • WidgetDiskUsage
            • WidgetDocumentsByCreationDate
            • WidgetDocumentsByCreationDateByPath
            • WidgetDuplicates
            • WidgetEmpty
            • WidgetEntities
            • WidgetFieldFacets
            • WidgetFileBarometer
            • WidgetListGroup
            • WidgetNames
            • WidgetNested
            • WidgetProject
            • WidgetRecommendedBy
            • WidgetSearchBar
            • WidgetText
            • WidgetTreeMap
        • Pages
          • App
          • DocumentModal
          • DocumentStandalone
          • DocumentView
          • Error
          • Landing
          • Login
          • Project
          • ProjectList
          • ProjectNew
          • ProjectView
          • ProjectViewAddDocuments
          • ProjectViewEdit
          • ProjectViewFindNamedEntities
          • ProjectViewInsights
          • Search
          • Settings
          • TaskAnalysis
          • TaskAnalysisList
          • TaskBatchDownload
          • TaskBatchDownloadList
          • TaskBatchSearch
          • TaskBatchSearchList
          • TaskBatchSearchNew
          • TaskBatchSearchView
          • TaskBatchSearchViewResults
          • Tasks
          • UserHistory
          • UserHistoryDocumentList
          • UserHistorySavedSearchList
    • Introduction to Tarentula
    • Index operations with Playground
    • Write extensions
    • Write plugins
Powered by GitBook

Datashare is an open source project by the International Consortium of Investigative Journalists

On this page
  • The documents and entities graph
  • Graph nodes
  • Graph relationships
  • Create your Datashare project's graph
  • Access your project's graph
  • With read access to Datashare's neo4j database
  • Without read access to Datashare's neo4j database
  • Explore and visualize entity links
  • Connect to your database
  • Visualize and explore with Neo4j Bloom
  • Query the graph with Neo4j Browser
  • Visualize and explore with Linkurious Enterprise Explorer
  • Visualize with Gephi
  • Export your graph in the GraphML format
Export as PDF
  1. Usage

Create a Neo4j graph and explore it

This page explains how to leverage neo4j to explore your Datashare projects. We recommend using a recent release of Datashare (>= 14.0.0) to use this feature, click on the "Other platforms and version

Last updated 1 year ago

The documents and entities graph

is a graph database technology which lets you represent your data as a graph. Inside Datashare, neo4j lets you connect entities between them through documents in which they appear.

After creating a graph from your Datashare project, you will be able to explore it and visualize these kinds of relationships between you project entities:

In the above graph, we can see 3 email document nodes in orange, 3 email address nodes in red, 1 person node in green and 1 location node in yellow. Reading the relationship types on the arrows, we can deduce the following information from the graph:

  • shapp@caiso.com emailed 20participants@caiso.com, the sent email has an id starting with f4db344...

  • one person named vincent is mentioned inside this email, as well as the california location

  • finally, the email also mentions the dle@caiso.com email address which is also mentioned in 2 other email documents (with id starting with 11df197... and 033b4a2...)

If you are not familiar with graph and neo4j, take a look at the following resources:

Graph nodes

The neo4j graph is composed of :Document nodes representing Datashare documents and :NamedEntity nodes representing entities mentioned in these documents.

The :NamedEntity nodes are additionally annotated with their entity types: :NamedEntity:PERSON, :NamedEntity:ORGANIZATION, :NamedEntity:LOCATION, :NamedEntity:EMAIL...

Graph relationships

In most cases, an entity :APPEARS_IN a document, which means that it was detected in the document content. In the particular case of email documents and EMAIL addresses, it is most of the time possible to identify richer relationships from the email metadata, such as who sent (:SENT relationship) and who received (:RECEIVED relationship) the email.

When an :EMAIL address entity is neither :SENT or :RECEIVED, like it is the case in the above graph for dle@caiso.com, it means that the address was mentioned in the email document body.

When a document is embedded inside another document (as an email attachment for instance), the child document is connected to its parent through the :HAS_PARENT relationship.

Create your Datashare project's graph

The creation of a neo4j graph inside Datashare is supported through a plugin. To use the plugin to create a graph, follow these instructions:

After the graph is created, navigate to the 'Projects' page and select your project. You should be able to visualize a new neo4j widget displaying the number of documents and entities found inside the graph:

Access your project's graph

Exporting and importing the graph into your own DB is also useful when you want to perform write operations on your graph without any consequences on Datashare.

With read access to Datashare's neo4j database

Without read access to Datashare's neo4j database

If you can't have read access to the database, you will need to export it and import it into your own neo4j instance (running on your laptop for instance).

Ask for a DB dump

Export your graph from Datashare

In case you don't have access to the DB and can't be provided with a dump, you can export the graph from inside. Be aware that limits might be applied on the size of the exported graph.

To export the graph, navigate to Datashare's 'Projects' page, select your project, select the 'Cypher shell' export format and click the 'Export graph' button:

In case you want to restrict the size of the exported graph, you can restrict the export to a subset of documents and their entities using the 'File types' and 'Project directory' filters.

DB import

Docker

  • identify your neo4j instance container ID:

    docker ps | grep neo4j # Should display your running neo4j container ID
  • copy your the graph dump inside your neo4j container import directory:

    docker cp \
        <export-path> \
        <neo4j-container-id>:/var/lib/neo4j/imports/datashare-graph.dump
  • docker exec -it <neo4j-container-id> /bin/bash
    ./bin/cypher-shell -f imports/datashare-graph.dump 

Neo4j Desktop import

  • open 'Cypher shell':

  • copy your the graph dump inside your neo4j instance import directory:

    cp <export-path> imports
  • ./bin/cypher-shell -f imports/datashare-graph.dump 

You will now be able to explore the graph imported in your own neo4j instance.

Explore and visualize entity links

Connect to your database

Visualize and explore with Neo4j Bloom

Neo4j Bloom is accessible from inside Neo4j Desktop app.

Find out more information about to use Neo4j Bloom to explore your graph with:

Query the graph with Neo4j Browser

The Neo4j Browser is available for both Enterprise and Community distributions. You can access it:

Visualize and explore with Linkurious Enterprise Explorer

Find out more information about Linkurious:

Visualize with Gephi

Find out more information about:

Export your graph in the GraphML format

In case you want to restrict the size of the exported graph, you can restrict the export to a subset of documents and their entities using the 'File types' and 'Project directory' filters.

Find out

Learn

Check out

when using Datashare

when Datashare is running

Depending on your access to the neo4j database behind Datashare, you might need to export the neo4j graph and import it locally to access it from .

If you have read access to the neo4j database (it should be the case if you are running Datashare on your computer), you will be able to plug to it and start exploring.

If it's possible, ask you system administrator for a DB dump obtained using the .

Depending on use one of the following ways to import your graph into your DB:

import the dumped file using the command:

import the dumped file using the command:

Once your graph is created and that you can access it (see if you can't access the Datashare's neo4j instance), you will be able to use your favorite tool to extract meaningful information from it.

Once you can , you can use different tools to visualize and explore it. You can start by connection the to your DB.

is a simple and powerful tool developed by neo4j to quickly visualize and query graphs, if you run Neo4j Enterprise Edition. Bloom lets you navigate and explore the graph through a user interface similar to the one below:

Bloom's

Bloom's

about graph exploration with Bloom

The lets you run queries on your graph to explore it and retrieve information from it. Cypher is like SQL for graphs, running Cypher queries inside the neo4j browser lets you explore the results as shown below:

inside the Neo4j Desktop app when running neo4j from the

at when running neo4j

is a proprietary software which, similarly to Neo4j Bloom, lets you visualize and query your graph through a powerful UI.

is a simple open-source visualization software. It is possible to export graphs from Datashare into the and import them into Gephi.

how to

Gephi

how to with Gephi

To export the graph in the , navigate to the 'Projects', select your project, choose the 'Graph ML' export format and click the 'Export graph' button:

You will now be able to by opening the exported GraphML file in it.

⚡
Get started with neo4j
what is a graph database?
neo4j fundamentals
how to use neo4j for investigative journalism
on your computer
on your server
neo4j-admin database dump command
how you run neo4j on your laptop
cypher-shell
cypher-shell
Neo4j Bloom
User Guide
Quick Start
this series of videos
Neo4j Browser
Cypher
Linkurious
Linkurious User Manual
configure Linkurious with neo4j
run Linkurious inside Docker
Gephi
GraphML File Format
export your graph in the GraphML format
features
get started
GraphML file format
visualization tools
visualization tools
this section
Neo4j Desktop
access your neo4j database
visualize the graph using Gephi
neo4j
http://localhost:7474/browser/
Desktop app
inside Docker
graph-widget
graph-dump
desktop-shell
bloom-viz
browser-viz
graph-ml-dump