7.1. DataLad extensions

DataLad’s commands cover a broad range of domain-agnostic use cases. However, there are extension packages that can add specialized functionality with additional commands. Table 7.1 lists a number of such extensions.

DataLad extensions are shipped as separate Python packages, and are not included in DataLad itself. Instead, users needing a particular extension can install the extension package – either on top of DataLad, if already installed, or on its own. In the latter case, the extension will then pull in DataLad core automatically, with no need to first or simultaneously install DataLad itself explicitly. The installation is done with standard Python package managers, such as pip, and beyond installation of the package, no additional setup is required.

DataLad extensions listed here are of various maturity levels. Check out their documentation and the sections or chapters associated with an extension to find out more about them.

Table 7.1 Selection of available DataLad extensions. A more up-to-date list can be found on PyPi

Name

Description

container

Equips DataLad’s datalad run (manual)/datalad rerun (manual) functionality with the ability to transparently execute commands in containerized computational environments. The section Computational reproducibility with software containers demonstrates how this extension can be used, as well as the use case An automatically and computationally reproducible neuroimaging analysis from scratch.

crawler

One of the initial goals behind DataLad was to provide access to already existing data resources. With datalad crawl-init (manual)/datalad crawl (manual) commands, this extension allows to automate creation of DataLad datasets from resources available online, and efficiently keep them up-to-date. The majority of datasets in the DataLad superdataset /// on datasets.datalad.org are created and updated using this extension functionality.

metalad

Equips DataLad with an alternative command suite and advanced tooling for metadata handling (extraction, aggregation, reporting).

neuroimaging

Metadata extraction support for a range of standards common to neuroimaging data. The use case An automatically and computationally reproducible neuroimaging analysis from scratch demonstrates how this extension can be used.

osf

Enables DataLad to interface and work with the Open Science Framework. Use it to publish your dataset’s data to an OSF project, thus utilizing the OSF for dataset storage and sharing.

ukbiobank

Equips DataLad with a set of commands to obtain and monitor imaging data releases of the UKBiobank. An introduction can be found in chapter

xnat

Equips DataLad with a set of commands to track XNAT projects. An alternative, more basic method to retrieve data from an XNAT server is outlined in section Configure custom data access.

To install a DataLad extension, use

$ pip install <extension-name>

such as in

$ pip install datalad-container

Afterwards, the new DataLad functionality the extension provides is readily available.

Some extensions could also be available from the software distribution (e.g., NeuroDebian or conda) you used to install DataLad itself. Visit the datalad-extensions project to review available versions and their status.