The Handbook¶
Welcome!
This handbook is a living resource about why and – more importantly – how to use DataLad. It aims to provide novices and advanced users of all backgrounds with both the basics of DataLad and start-to-end use cases of specific applications. If you want to get hands-on experience and learn DataLad, the Basics part of this book will teach you. If you want to know what is possible, the use cases will show you. And if you want to help others to get started with DataLad, the companion repository provides free and open source teaching material tailored to the handbook.
Before you read on, please note that this version of the handbook is based on at least DataLad version 0.17, and the higher your version of DataLad is, the better. The section Installation and configuration will set you up with what you need if you currently do not have DataLad 0.17 or higher installed.
Did you know …
… that you can also easily get a physical copy of this book?
For example from a book store near you (ISBN 979-8857037973), or from any amazon site (e.g., US or EU).
If you’re new here, please start the handbook here. Alternatively, try to identify with one of several user-types in this user specific guide to the handbook.
The handbook is a collaborative resource
If you would be willing to provide feedback on its contents, please get in touch.
- Basics
- 1. DataLad datasets
- 2. DataLad, run!
- 3. Under the hood: git-annex
- 4. Collaboration
- 5. Tuning datasets to your needs
- 6. Make the most out of datasets
- 7. One step further
- 8. Third party infrastructure
- 8.1. Beyond shared infrastructure
- 8.2. Publishing datasets to Git repository hosting
- 8.3. Walk-through: Dropbox as a special remote
- 8.4. Walk-through: Amazon S3 as a special remote
- 8.5. Walk-through: Git LFS as a special remote on GitHub
- 8.6. Walk-through: Dataset hosting on GIN
- 8.7. Built-in data export
- 8.8. Keeping (some) dataset contents private
- 8.9. The datalad push command
- 8.10. Summary
- 9. Help yourself
- Use cases
- A typical collaborative data management workflow
- Basic provenance tracking
- Writing a reproducible paper
- Student supervision in a research project
- A basic automatically and computationally reproducible neuroimaging analysis
- An automatically and computationally reproducible neuroimaging analysis from scratch
- Scaling up: Managing 80TB and 15 million files from the HCP release
- Building a scalable data storage for scientific computing
- Using Globus as a data store for the Canadian Open Neuroscience Portal
- DataLad for reproducible machine-learning analyses
- Encrypted data storage and transport
- Contributing
Appendix¶
- Glossary
- Frequently asked questions
- DataLad cheat sheet
- Contributing
- Teaching with the DataLad Handbook
- Acknowledgements
- Copyright and licenses
- Tell me what you are and I tell you where to start
- Handbook Poster from the 2020 (virtual) OHBM
- OpenNeuro Quickstart Guide: Accessing OpenNeuro datasets via DataLad
- So… Windows… eh?
- How to name a file: Interoperability considerations
Code lists from chapters¶
- About code lists
- Code from chapter: 01_dataset_basics
- Code from chapter: 02_reproducible_execution
- Code from chapter: 10_yoda
- OHBM Brainhack TrainTrack: DataLad
- OHBM 2020 Open Science Room: Reproducible Research Objects with DataLad
- An introduction to DataLad with a focus on ML
- DataLad tutorial at the MPI Leipzig
- An introduction to DataLad at the MPI Berlin
- An introduction to DataLad for the ABCD ReproNim course week 8b
- An introduction to DataLad for Yale
- An introduction to DataLad at the ReproNim DGPA workshop
- Neurohackdemy 2022: Data Management for Neuroimaging with DataLad
- An introduction to DataLad at the Open Science Office Hour