The DataLad Handbook¶

Virtual directory tree of a nested DataLad dataset

Important

PLEASE NOTE: This is an archived version of the DataLad handbook corresponding to its 0.12.0 release (January 2020), which in turn was corresponding to the 0.12.0 release of DataLad. This handbook version is not a complete documentation of all functionality in DataLad 0.12.0, but the state the handbook was in at this time. Find the latest released version of the handbook at handbook.datalad.org/en/stable, and its most recent version (including general fixes, visual improvements, and additions of existing commands or workflows based on existing functionality) at handbook.datalad.org/en/latest. The CHANGELOG summarizes the contents and additions that happened between Handbook versions.

Welcome to the DataLad handbook!¶

This handbook is a living resource about why and – more importantly – how to use DataLad. It aims to provide novices and advanced users of all backgrounds with both the basics of DataLad and start-to-end use cases of specific applications. If you want to get hands-on experience and learn DataLad, the Basics part of this book will teach you. If you want to know what is possible, the use cases will show you. And if you want to help others to get started with DataLad, the companion repository provides free and open source teaching material tailored to the handbook.

Before you read on, please note that the handbook is based on DataLad version 0.12, but the section Installation and configuration will set you up with what you need if you currently do not have DataLad 0.12 or higher installed. If you’re new here, please start the handbook here.

Important

The handbook is currently in beta stage. If you would be willing to provide feedback on its contents, please get in touch.

Introduction¶

What DataLad and the handbook are all about

Basics 1 – DataLad datasets¶

Exploring DataLad's core data structure

Basics 2 – Datalad, Run!¶

How DataLad records provenance of dataset modifications

Basics 3 – Under the hood: git-annex¶

A closer look at how and why things work

Basics 4 – Collaboration¶

Sharing on shared file systems

Basics 5 – Tuning datasets to your needs¶

Various types and methods for dataset configurations

Basics 6 – Make the most out of datasets¶

Organizational principles and best practices for reproducible data analyses

Basics 7 – One step further¶

Advanced nesting and "extended" reproducibility

Basics 8 – Help yourself¶

Dealing with problems, filesystems, and version histories

Basics 9 Third party infrastructure¶

Leverage third party services to share datasets

Basics 10 – Further options¶

Small pieces of advice and helpful additional options

Use Cases¶

Hands-on real-world applications with step-by-step recipes...

Appendix¶

Further information and references

Code lists from chapters¶

Easy access to copy-paste snippets for workshops

Contributors

Useful Links