The DataLad Handbook¶
Important
PLEASE NOTE: This is an archived version of the DataLad handbook corresponding to its 0.12.0 release (January 2020), which in turn was corresponding to the 0.12.0 release of DataLad. This handbook version is not a complete documentation of all functionality in DataLad 0.12.0, but the state the handbook was in at this time. Find the latest released version of the handbook at handbook.datalad.org/en/stable, and its most recent version (including general fixes, visual improvements, and additions of existing commands or workflows based on existing functionality) at handbook.datalad.org/en/latest. The CHANGELOG summarizes the contents and additions that happened between Handbook versions.
Welcome to the DataLad handbook!¶
This handbook is a living resource about why and – more importantly – how to use DataLad. It aims to provide novices and advanced users of all backgrounds with both the basics of DataLad and start-to-end use cases of specific applications. If you want to get hands-on experience and learn DataLad, the Basics part of this book will teach you. If you want to know what is possible, the use cases will show you. And if you want to help others to get started with DataLad, the companion repository provides free and open source teaching material tailored to the handbook.
Before you read on, please note that the handbook is based on DataLad version 0.12, but the section Installation and configuration will set you up with what you need if you currently do not have DataLad 0.12 or higher installed. If you’re new here, please start the handbook here.
Important
The handbook is currently in beta stage. If you would be willing to provide feedback on its contents, please get in touch.
Introduction¶
What DataLad and the handbook are all about
Basics 2 – Datalad, Run!¶
How DataLad records provenance of dataset modifications
Basics 3 – Under the hood: git-annex¶
A closer look at how and why things work
Basics 4 – Collaboration¶
Sharing on shared file systems
Basics 5 – Tuning datasets to your needs¶
Various types and methods for dataset configurations
Basics 6 – Make the most out of datasets¶
Organizational principles and best practices for reproducible data analyses
Basics 7 – One step further¶
Advanced nesting and "extended" reproducibility
Basics 8 – Help yourself¶
Dealing with problems, filesystems, and version histories
Basics 9 Third party infrastructure¶
Leverage third party services to share datasets
Basics 10 – Further options¶
Small pieces of advice and helpful additional options
Appendix¶
Further information and references