1.2. DataLad’s extensions

The commands DataLad provides cover a broad range of domain-agnostic use cases. However, there are extension packages that can add (domain-specific) functionality and new commands.

Such extensions are shipped as separate Python packages, and are not included in DataLad itself. Instead, users with the need for a particular extension can install the extension package – either on top of DataLad if DataLad is already installed, or on its own (the extension will then pull in DataLad core automatically, with no need to first or simultaneously install DataLad itself explicitly). The installation is done with standard Python package managers, such as pip, and beyond installation of the package, no additional setup is required.

Note

DataLad extensions listed here are of various maturity levels. Check out their documentation and the sections or chapters associated with an extension to find out more about them. We are working on content to describe each of the extensions, but this is not a high priority at the given time. Contributions of sections, chapters, or demonstrations for extensions that do not yet have one in the handbook are highly welcomed.

Among others (a full list can be found on PyPi), the following DataLad extensions are available:

Extension name

Description

DataLad Container

Equips DataLad’s run/rerun functionality with the ability to transparently execute commands in containerized computational environments. The section Computational reproducibility with software containers demonstrates how this extension can be used, as well as the usecase An automatically and computationally reproducible neuroimaging analysis from scratch.

DataLad Crawler

One of the initial goals behind DataLad was to provide access to already existing data resources. With crawl-init/crawl commands, this extension allows to automate creation of DataLad datasets from resources available online, and efficiently keep them up-to-date. The majority of datasets in the DataLad superdataset /// on datasets.datalad.org are created and updated using this extension functionality.

Todo

contribute a section or a demo, e.g. based on existing one

DataLad Neuroimaging

Metadata extraction support for a range of standards common to neuroimaging data. The usecase An automatically and computationally reproducible neuroimaging analysis from scratch demonstrates how this extension can be used.

DataLad Hirni

A neuroimaging specific extension to allow reproducible DICOM to BIDS conversion of (f)MRI data. The chapter … introduces this extension.

Todo

link hirni chapter once done

DataLad Metalad

Equips DataLad with an alternative command suite and advanced tooling for metadata handling (extraction, aggregation, reporting).

Todo

once section on metadata is done, link it here

DataLad XNAT

Equips DataLad with a set of commands to track XNAT projects. An alternative, more basic method to retrieve data from an XNAT server is outlined in section Configure custom data access.

DataLad UKBiobank

Equips DataLad with a set of commands to obtain and monitor imaging data releases of the UKBiobank. An introduction can be found in chapter

Todo

link UKB chapter once done

DataLad htcondor

Enhances DataLad with the ability for remote execution via the job scheduler HTCondor.

DataLad’s Git-remote-clone helper

Enables DataLad to push and pull to all third party providers with no native Git support that are supported by rclone.

Todo

Rewrite Third Party chapter to use this helper

To install a DataLad extension, use

$ pip install <extension-name>

such as in

$ pip install datalad-container

Afterwards, the new DataLad functionality the extension provides is readily available.

Some extensions could also be available from the software distribution (e.g., NeuroDebian or conda) you used to install DataLad itself. Visit github.com/datalad/datalad-extensions/ to review available versions and their status.