3.1. DataLad on High Throughput or High Performance Compute Clusters

For efficient computing of large analysis, to comply to best computing practices, or to fulfil the requirements that responsible system administrators impose, users may turn to computational clusters such as high-performance computing (HPC) or high-throughput computing (HTC) infrastructure for data analysis, back-up, or storage.

This chapter is a collection of useful resources and examples that aims to help you get started with DataLad-centric workflows on clusters. We hope to grow this chapter further, so please get in touch if you want to share your use case or seek more advice.

3.1.1. Pointers to content in other chapters

To find out more about centralized storage solutions, you may want to checkout the usecase Building a scalable data storage for scientific computing or the section Remote Indexed Archives for dataset storage and backup.

3.1.2. DataLad installation on a cluster

Users of a compute cluster generally do not have administrative privileges (sudo rights) and thus can not install software as easily as on their own, private machine. In order to get DataLad and its underlying tools installed, you can either bribe (kindly ask) your system administrator1 or install everything for your own user only following the instructions in the paragraph Linux-machines with no root access (e.g. HPC systems) of the installation page.

Footnotes

1

You may not need to bribe your system administrator if you are kind to them. Consider frequent gestures of appreciation, or send a geeky T-Shirt for SysAdminDay (the last Friday in July) – Sysadmins do amazing work!