3. Installation and configuration

3.1. Install DataLad

Feedback on installation instructions

The installation methods presented in this chapter are based on experience and have been tested carefully. However, operating systems and other software are continuously evolving, and these guides might have become outdated. Be sure to check out the online-handbook for up-to-date information.

In general, the DataLad installation requires Python 3 (see the Find-out-more on the difference between Python 2 and 3 to learn why this is required), Git, and git-annex, and for some functionality 7-Zip. The instructions below detail how to install the core DataLad tool and its dependencies on common operating systems. They do not cover the various DataLad extensions that need to be installed separately, if desired.

Python 2, Python 3, what’s the difference?

DataLad requires Python 3.8, or a more recent version, to be installed on your system. The easiest way to verify that this is the case is to open a terminal and type python to start a Python session:

$ python
Python 3.9.1+ (default, Jan 20 2021, 14:49:22)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

If this fails, or reports a Python version with a leading 2, such as Python 2.7.18, try starting python3, which some systems use to disambiguate between Python 2 and Python 3. If this fails, too, you need to obtain a recent release of Python 3. On Windows, attempting to run commands that are not installed might cause a Windows Store window to pop up. If this happens, Python may not yet be installed. Please check the Windows 10 and 11 installation instructions, and do not install Python via the Windows Store.

Python 2 is an outdated, in technical terms “deprecated”, version of Python. Although it still exist as the default Python version on many systems, it is no longer maintained since 2020, and thus, most software has dropped support for Python 2. If you only run Python 2 on your system, most Python software, including DataLad, will be incompatible, and hence unusable, resulting in errors during installation and execution.

But does that mean that you should uninstall Python 2? No! Keep it installed, especially if you are using Linux or macOS. Python 2 existed for 20 years and numerous software has been written for it. It is quite likely that some basic operating system components or legacy software on your computer is depending on it, and uninstalling a preinstalled Python 2 from your system will likely render it unusable. Install Python 3, and have both versions coexist peacefully.

The following sections provide targeted installation instructions for a set of common scenarios, operating systems, or platforms.

Cartoon of a person sitting on the floor in front of a laptop

3.1.1. Windows 10 and 11

There are countless ways to install software on Windows. Here we describe one possible approach that should work on any Windows computer, like one that you may have just bought.

Python:

Windows itself does not ship with Python, it must be installed separately. If you already did that, please check the Find-out-more on Python versions, if it matches the requirements. Otherwise, head over to the download section of the Python website, and download an installer. Unless you have specific requirements, go with the 64bit installer of the latest Python 3 release.

Avoid installing Python from the Windows store

We recommend to not install Python via the Windows store, even if it opens after you typed python, as this version requires additional configurations by hand (in particular of your $PATH environment variable).

When you run the installer, make sure to select the Add Python to PATH option, as this is required for subsequent installation steps and interactive use later on. Other than that, using the default installation settings is just fine.

Verify Python installation

It is not uncommon for multiple Python installations to co-exist on a Windows machine, because particular applications can ship their own. Such alternative installations may even be or become the default. This can cause confusing behavior, because each Python installation will have different package versions installed.

To verify if there are multiple installations, open the windows command line cmd.exe and run where python. This will list all variants of python.exe. There will be one in WindowsApps, which is only a link to the Windows app store. Make sure the Python version that you installed is listed too.

If there are multiple Python installation, you can tell which one is default by running this command in cmd.exe:

> python -c "import sys; print(sys.executable)"

This will print the path of the default python.exe. If the output is not matching the expected Python installation, likely the $PATH environment variable needs to be adjusted. This can be done in the Windows system properties. It is sufficient to move the entries created by the Python installer to the start of the declaration list.

Git:

Windows also does not come with Git. If you happen to have it installed already, please check if you have configured it for command line use. You should be able to open the Windows command prompt and run a command like git --version. It should return a version number and not an error.

To install Git, visit the Git website and download an installer. If in doubt, go with the 64bit installer of the latest version. The installer itself provides various customization options. We recommend to leave the defaults as they are, in particular the target directory, but configure the following settings (they are distributed over multiple dialogs):

  • Select Git from the command line and also from 3rd-party software

  • Enable file system caching

  • Select Use external OpenSSH

  • Enable symbolic links

Git-annex:

There are two convenient ways to install git-annex. The first is downloading the installer from git-annex’ homepage. The other is to deploy git-annex via the DataLad installer. The latter option requires the installation of the datalad-installer Python package. Once Python is available, it can be done with the Python package manager pip. Open a command prompt and run:

> python -m pip install datalad-installer

Afterwards, open another command prompt in administrator mode and run:

> datalad-installer git-annex -m datalad/git-annex:release

This will download a recent git-annex, and configure it for your Git installation. The admin command prompt can be closed afterwards, all other steps do not need it.

For performance improvements, regardless of which installation method you chose, we recommend to also set the following git-annex configuration:

> git config --global filter.annex.process "git-annex filter-process"
DataLad:

With Python, Git, and git-annex installed, DataLad can be installed, and later also upgraded using pip by running:

> python -m pip install datalad
7-Zip (optional, but highly recommended):

Download it from the 7-zip website (64bit installer when in doubt), and install it into the default target directory.

There are many other ways to install DataLad on Windows, check for example the Windows-wit on the Windows Subsystem 2 for Linux. One attractive alternative approach is Conda, a completely different approach is to install the DataLad Gooey, which is a standalone installation of DataLad’s graphical application (see the DataLad Gooey documentation for installation instructions).

Install DataLad using the Windows Subsystem 2 for Linux

With the Windows Subsystem for Linux, you will be able to use a Unix system despite being on Windows. You need to have a recent build of Windows in order to get WSL2 – we do not recommend WSL1.

You can find out how to install the Windows Subsystem for Linux at docs.microsoft.com. Afterwards, proceed with your installation as described in the installation instructions for Linux.

Using DataLad on Windows has a few peculiarities. In general, DataLad can feel a bit sluggish on non-WSL2 Windows systems. This is due to various file system issues that also affect the version control system Git itself, which DataLad relies on. The core functionality of DataLad works, and you should be able to follow most contents covered in this book. You will notice, however, that some Unix commands displayed in examples may not work, and that terminal output can look different from what is displayed in the code examples of the book, and that some dependencies for additional functionality are not available for Windows. Dedicated notes, “Windows-wits”, contain important information, alternative commands, or warnings, and an overview of useful Windows commands and general information is included in The command line.

3.1.2. Mac (incl. M1)

Modern Macs come with a compatible Python 3 version installed by default. The Find-out-more on Python versions has instructions on how to confirm that.

DataLad is available via OS X’s homebrew package manager. First, install the homebrew package manager, which requires Xcode to be installed from the Mac App Store.

Next, install datalad and its dependencies:

$ brew install datalad

Alternatively, you can exclusively use brew for DataLad’s non-Python dependencies, and then check the Find-out-more on how to install DataLad via Python's package manager.

Install DataLad via pip on macOS

If Git/git-annex are installed already (via brew), DataLad can also be installed via Python’s package manager pip, which should be installed by default on your system:

$ python -m pip install datalad

Some macOS versions may use python3 instead of python – use tab completion to find out which is installed.

Recent macOS versions may warn after installation that scripts were installed into locations that were not on PATH:

The script chardetect is installed in
'/Users/MYUSERNAME/Library/Python/3.11/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to
suppress this warning, use --no-warn-script-location.

To fix this, add these paths to the $PATH environment variable. You can do this for your own user account by adding something like the following to the profile file of your shell (exchange the user name accordingly):

$ export PATH=$PATH:/Users/MYUSERNAME/Library/Python/3.11/bin

If you use a bash shell, this may be ~/.bashrc or ~/.bash_profile, if you are using a zsh shell, it may be ~/.zshrc or ~/.zprofile. Find out which shell you are using by typing echo $SHELL into your terminal.

Alternatively, you could configure it system-wide, i.e., for all users of your computer by adding the path /Users/MYUSERNAME/Library/Python/3.11/bin to the file /etc/paths, e.g., with the editor nano (requires using sudo and authenticating with your password):

$ sudo nano /etc/paths

The contents of this file could look like this afterwards (the last line was added):

/usr/local/bin
/usr/bin
/bin
/usr/sbin
/sbin
/Users/MYUSERNAME/Library/Python/3.11/bin

3.1.3. Linux: (Neuro)Debian, Ubuntu, and similar systems

DataLad is part of the Debian and Ubuntu operating systems. However, the particular DataLad version included in a release may be a bit older (check the versions for Debian and Ubuntu to see which ones are available).

For some recent releases of Debian-based operating systems, NeuroDebian provides more recent DataLad versions (check the availability table). In order to install from NeuroDebian, follow its installation documentation, which only requires copy-pasting three lines into a terminal. Also, should you be confused by the name: enabling this repository will not do any harm if your field is not neuroscience.

Whichever repository you end up using, the following command installs DataLad and all of its software dependencies (including git-annex and p7zip):

$ sudo apt-get install datalad

The command above will also upgrade existing installations to the most recent available version.

3.1.4. Linux: CentOS, Redhat, Fedora, or similar systems

For CentOS, Redhat, Fedora, or similar distributions, there is an RPM package for git-annex. A suitable version of Python and Git should come with the operating system, although some servers may run fairly old releases.

DataLad itself can be installed via pip:

$ python -m pip install datalad

Alternatively, DataLad can be installed together with Git and git-annex via Conda.

3.1.5. Linux-machines with no root access (e.g. HPC systems)

The most convenient user-based installation can be achieved via Conda.

3.1.6. Conda

Conda is a software distribution available for all major operating systems, and its Miniconda installer offers a convenient way to bootstrap a DataLad installation. Importantly, it does not require admin/root access to a system.

Detailed, platform-specific installation instructions are available in the Conda documentation. In short: download and run the installer, or, from the command line, run

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-<YOUR-OS>-x86_64.sh
$ bash Miniconda3-latest-<YOUR-OS>-x86_64.sh

In the above call, replace <YOUR-OS> with an identifier for your operating system, such as “Linux” or “MacOSX”. During the installation, you will need to accept a license agreement (press Enter to scroll down, and type “yes” and Enter to accept), confirm the installation into the default directory, and you should respond “yes” to the prompt “Do you wish the installer to initialize Miniconda3 by running conda init? [yes|no]”. Afterwards, you can remove the installation script by running rm ./Miniconda3-latest-*-x86_64.sh.

The installer automatically configures the shell to make conda-installed tools accessible, so no further configuration is necessary. Once Conda is installed, the DataLad package can be installed from the conda-forge channel:

$ conda install -c conda-forge datalad

In general, all of DataLad’s software dependencies are automatically installed, too. This makes a conda-based deployment very convenient. A from-scratch DataLad installation on a HPC system, as a normal user, is done in three lines:

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
$ # acknowledge license, keep everything at default
$ conda install -c conda-forge datalad

In case a dependency is not available from Conda (e.g., there is no git-annex package for Windows in Conda), please refer to the platform-specific instructions above.

To update an existing installation with conda, use:

$ conda update -c conda-forge datalad

The DataLad installer also supports setting up a Conda environment, in case a suitable Python version is already available.

3.1.7. Using Python’s package manager pip

As mentioned above, DataLad can be installed via Python’s package manager pip. pip comes with any Python distribution from python.org, and is available as a system-package in nearly all GNU/Linux distributions.

If you have Python and pip set up, to automatically install DataLad and most of its software dependencies, type

$ python -m pip install datalad

If this results in a permission denied error, you can install DataLad into a user’s home directory:

$ python -m pip install --user datalad

On some systems, you may need to call python3 instead of python:

$ python3 -m pip install datalad
$ # or, in case of a "permission denied error":
$ python3 -m pip install --user datalad

An existing installation can be upgraded with python -m pip install -U datalad.

pip is not able to install non-Python software, such as 7-zip or git-annex. But you can install the DataLad installer via a python -m pip install datalad-installer. This is a command-line tool that aids installation of DataLad and its key software dependencies on a range of platforms.

3.2. Initial configuration

Initial configurations only concern the setup of a Git identity. If you are a Git-user, you should hence be good to go.

../_images/gitidentity.svg

If you have not used the version control system Git before, you will need to tell Git some information about you. This needs to be done only once. In the following example, exchange Bob McBobFace with your own name, and bob@example.com with your own email address.

$ # enter your home directory using the ~ shortcut
$ cd ~
$ git config --global --add user.name "Bob McBobFace"
$ git config --global --add user.email bob@example.com

This information is used to track changes in the DataLad projects you will be working on. Based on this information, changes you make are associated with your name and email address, and you should use a real email address and name – it does not establish a lot of trust nor is it helpful after a few years if your history, especially in a collaborative project, shows that changes were made by Anonymous with the email youdontgetmy@email.fu. And do not worry, you won’t get any emails from Git or DataLad.