What you really need to know¶
DataLad is a data management multitool that can assist you in handling the entire life cycle of digital objects. It is a command-line tool, free and open source, and available for all major operating systems.
This document is the 10.000 feet overview of important concepts, commands, and capacities of DataLad. Each section briefly highlights one type of functionality or concept and the associated commands, and the upcoming Basics chapters will demonstrate in detail how to use them.
Every command affects or uses DataLad datasets, the core data structure of DataLad. A dataset is a directory on a computer that DataLad manages.
You can create new, empty datasets from scratch and populate them, or transform existing directories into datasets.
Simplified local version control workflows¶
Thus, you can keep track of revisions of data of any size, and view, interact with or restore any version of your dataset’s history.
Consumption and collaboration¶
DataLad lets you consume datasets provided by others, and collaborate with them. You can install existing datasets and update them from their sources, or create sibling datasets that you can publish updates to and pull updates from for collaboration and data sharing.
Additionally, you can get access to publicly available open data collections with the DataLad superdataset ///.
Full provenance capture and reproducibility¶
DataLad allows to capture full provenance: The origin of datasets, the origin of files obtained from web sources, complete machine-readable and automatically reproducible records of how files were created (including software environments).
You or your collaborators can thus re-obtain or reproducibly recompute content with a single command, and make use of extensive provenance of dataset content (who created it, when, and how?).
Third party service integration¶
Extract, aggregate, and query dataset metadata. This allows to automatically obtain metadata according to different metadata standards (EXIF, XMP, ID3, BIDS, DICOM, NIfTI1, …), store this metadata in a portable format, share it, and search dataset contents.
All in all…¶
You can use DataLad for a variety of use cases. At its core, it is a domain-agnostic and self-effacing tool: DataLad allows to improve your data management without custom data structures or the need for central infrastructure or third party services. If you are interested in more high-level information on DataLad, you can find answers to common questions in the section Frequently Asked Questions, and a concise command cheat-sheet in section DataLad cheat sheet.
But enough of the introduction now – let’s dive into the Basics!