3. Installation and configuration

3.1. Install DataLad

Feedback on installation instructions

The installation methods presented in this chapter are based on experience and have been tested carefully. However, operating systems and other software are continuously evolving, and these guides might have become outdated. Please file an issue, if you encounter problems installing DataLad, and help keeping this information up-to-date.

In general, the DataLad installation requires Python 3 (see the Find-out-more on the difference between Python 2 and 3 to learn why this is required), Git, and git-annex, and for some functionality 7-Zip. The instructions below detail how to install the core DataLad tool and its dependencies on common operating systems. They do not cover the various DataLad extensions that need to be installed separately, if desired.

Python 2, Python 3, what’s the difference?

DataLad requires Python 3.6, or a more recent version, to be installed on your system. The easiest way to verify that this is the case is to open a terminal and type python to start a Python session:

$ python
Python 3.9.1+ (default, Jan 20 2021, 14:49:22)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

If this fails, or reports a Python version with a leading 2, such as Python 2.7.18, try starting python3, which some systems use to disambiguate between Python 2 and Python 3. If this fails, too, you need to obtain a recent release of Python 3. On Windows, attempting to run commands that are not installed might cause a Windows Store window to pop up. If this happens, it means you have no Python installed. Please check the Windows 10 installation instructions, and do not install Python via the Windows Store.

Python 2 is an outdated, in technical terms “deprecated”, version of Python. Although it still exist as the default Python version on many systems, it is no longer maintained since 2020, and thus, most software has dropped support for Python 2. If you only run Python 2 on your system, most Python software, including DataLad, will be incompatible, and hence unusable, resulting in errors during installation and execution.

But does that mean that you should uninstall Python 2? No! Keep it installed, especially if you are using Linux or MacOS. Python 2 existed for 20 years and numerous software has been written for it. It is quite likely that some basic operating system components or legacy software on your computer is depending on it, and uninstalling a preinstalled Python 2 from your system will likely render it unusable. Install Python 3, and have both versions coexist peacefully.

The following sections provide targeted installation instructions for a set of common scenarios, operating systems, or platforms.

Cartoon of a person sitting on the floor in front of a laptop

3.1.1. Windows 10

There are countless ways to install software on Windows. Here we describe one possible approach that should work on any Windows computer, like one that you may have just bought.

Python:

Windows itself does not ship with Python, it must be installed separately. If you already did that, please check the Find-out-more on Python versions, if it matches the requirements. Otherwise, head over to the download section of the Python website, and download an installer. Unless you have specific requirements, go with the 64bit installer of the latest Python 3 release.

Avoid installing Python from the Windows store

We recommend to not install Python via the Windows store, even if it opens after you typed python, as this version requires additional configurations by hand (in particular of your $PATH environment variable).

When you run the installer, make sure to select the Add Python to PATH option, as this is required for subsequent installation steps and interactive use later on. Other than that, using the default installation settings is just fine.

Git:

Windows also does not come with Git. If you happen to have it installed already, please check, if you have configured it for command line use. You should be able to open the Windows command prompt and run a command like git --version. It should return a version number and not an error.

To install Git, visit the Git website and download an installer. If in doubt, go with the 64bit installer of the latest version. The installer itself provides various customization options. We recommend to leave the defaults as they are, in particular the target directory, but configure the following settings (they are distributed over multiple dialogs):

  • Enable Use a TrueType font in all console windows

  • Select Git from the command line and also from 3rd-party software

  • Enable file system caching

  • Enable symbolic links

Git-annex:

The most convenient way to deploy git-annex is via the DataLad installer. Once Python is available, it can be installed with the Python package manager pip. Open a command prompt and run:

> pip install datalad-installer

Afterwards, open another command prompt in administrator mode and run:

> datalad-installer git-annex -m datalad/packages

This will download a recent git-annex, and configure it for your Git installation. The admin command prompt can be closed afterwards, all other steps do not need it.

DataLad:

With Python, Git, and git-annex installed, DataLad can be installed, and later also upgraded using pip by running:

> pip install datalad
7-Zip (optional, but highly recommended):

Download it from the 7-zip website (64bit installer when in doubt), and install it into the default target directory.

There are many other ways to install DataLad on Windows, check for example the Windows-wit on the Windows Subsystem 2 for Linux. One particularly attractive approach is Conda. However, at the moment git-annex is not available from Conda on Windows. If you want to use Conda, perform the Conda-based DataLad installation first, and then install git-annex via the DataLad installer, as described above.

Install DataLad using the Windows Subsystem 2 for Linux

With the Windows Subsystem for Linux, you will be able to use a Unix system despite being on Windows. You need to have a recent build of Windows 10 in order to get WSL2 – we do not recommend WSL1.

You can find out how to install the Windows Subsystem for Linux at docs.microsoft.com. Afterwards, proceed with your installation as described in the installation instructions for Linux.

Using DataLad on Windows has a few peculiarities. There is a dedicated summary, So… Windows… eh? with an overview. In general, DataLad can feel a bit sluggish on non-WSL2 Windows systems. This is due to various filesystem issues that also affect the version control system Git itself, which DataLad relies on. The core functionality of DataLad works, and you should be able to follow most contents covered in this book. You will notice, however, that some Unix commands displayed in examples may not work, and that terminal output can look different from what is displayed in the code examples of the book, and that some dependencies for additional functionality are not available for Windows. If you are a Windows user and want to help improve the handbook for Windows users, please get in touch. Dedicated notes, “Windows-wits”, contain important information, alternative commands, or warnings. If you on a native Windows 10 system, you should pay close attention to them.

3.1.2. Mac OSX

Modern Macs come with a compatible Python 3 version installed by default. The Find-out-more on Python versions has instructions on how to confirm that.

DataLad is available via OS X’s homebrew package manager. First, install the homebrew package manager, which requires Xcode to be installed from the Mac App Store.

Next, install datalad and its dependencies:

$ brew install datalad

Likewise, the optional, but recommended, p7zip dependency can be installed via brew as well.

Alternatively, you can exclusively use brew for DataLad’s non-Python dependencies, and then check the Find-out-more on how to install DataLad via Python's package manager.

Install DataLad via pip on MacOSX

If Git/git-annex are installed already (via brew), DataLad can also be installed via Python’s package manager pip, which should be installed by default on your system:

$ pip install datalad

Recent macOS versions may use pip3 instead of pip – use tab completion to find out which is installed.

Recent macOS versions may warn after installation that scripts were installed into locations that were not on PATH:

The script chardetect is installed in
'/Users/MYUSERNAME/Library/Python/3.7/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to
suppress this warning, use --no-warn-script-location.

To fix this, add these paths to the $PATH environment variable. You can either do this for your own user (1), or for all users of the computer (2) (requires using sudo and authenticating with your computer’s password):

  1. Add something like (exchange the user name accordingly)

    export PATH=$PATH:/Users/MYUSERNAME/Library/Python/3.7/bin
    

    to the profile file of your shell. If you use a bash shell, this may be ~/.bashrc or ~/.bash_profile, if you are using a zsh shell, it may be ~/.zshrc or ~/.zprofile. Find out which shell you are using by typing echo $SHELL into your terminal.

  2. Alternatively, configure it system-wide, i.e., for all users of your computer by adding the the path /Users/MYUSERNAME/Library/Python/3.7/bin to the file /etc/paths, e.g., with the editor nano:

    sudo nano /etc/paths
    

    The contents of this file could look like this afterwards (the last line was added):

    /usr/local/bin
    /usr/bin
    /bin
    /usr/sbin
    /sbin
    /Users/MYUSERNAME/Library/Python/3.7/bin
    

3.1.3. Linux: (Neuro)Debian, Ubuntu, and similar systems

DataLad is part of the Debian and Ubuntu operating systems. However, the particular DataLad version included in a release may be a bit older (check the versions for Debian and Ubuntu to see which ones are available).

For some recent releases of Debian-based operating systems, NeuroDebian provides more recent DataLad versions (check the availability table). In order to install from NeuroDebian, follow its installation documentation, which only requires copy-pasting three lines into a terminal. Also, should you be confused by the name: enabling this repository will not do any harm if your field is not neuroscience.

Whichever repository you end up using, the following command installs DataLad and all of its software dependencies (including git-annex and p7zip):

$ sudo apt-get install datalad

The command above will also upgrade existing installations to the most recent available version.

3.1.4. Linux: CentOS, Redhat, Fedora, or similar systems

For CentOS, Redhat, Fedora, or similar distributions, there is an RPM package for git-annex. A suitable version of Python and Git should come with the operating system, although some servers may run fairly old releases.

DataLad itself can be installed via pip:

$ pip install datalad

Alternatively, DataLad can be installed together with Git and git-annex via Conda as outlined in the section below.

3.1.5. Linux-machines with no root access (e.g. HPC systems)

The most convenient user-based installation can be achieved via Conda.

3.1.6. Conda

Conda is a software distribution available for all major operating systems, and its Miniconda installer offers a convenient way to bootstrap a DataLad installation. Importantly, it does not require admin/root access to a system.

Detailed, platform-specific installation instruction are available in the Conda documentation. In short: download and run the installer, or, from the command line, run

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-<YOUR-OS>-x86_64.sh
$ bash Miniconda3-latest-<YOUR-OS>-x86_64.sh

In the above call, replace <YOUR-OS> with an identifier for your operating system, such as “Linux” or “MacOSX”. During the installation, you will need to accept a license agreement (press Enter to scroll down, and type “yes” and Enter to accept), confirm the installation into the default directory, and you should respond “yes” to the prompt “Do you wish the installer to initialize Miniconda3 by running conda init? [yes|no]”. Afterwards, you can remove the installation script by running rm ./Miniconda3-latest-*-x86_64.sh.

The installer automatically configures the shell to make conda-installed tools accessible, so no further configuration is necessary. Once Conda is installed, the DataLad package can be installed from the conda-forge channel:

$ conda install -c conda-forge datalad

In general, all of DataLad’s software dependencies are automatically installed, too. This makes a conda-based deployment very convenient. A from-scratch DataLad installation on a HPC system, as a normal user, is done in three lines:

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
# acknowledge license, keep everything at default
$ conda install -c conda-forge datalad

In case a dependency is not available from Conda (e.g., there is no git-annex package for Windows in Conda), please refer to the platform-specific instructions above.

To update an existing installation with conda, use:

$ conda update datalad

Install Unix command-line tools on Windows with Conda

On Windows, many Unix command-line tools such as cp that a frequently used in this handbook are not available by default. You can get a good set of tools by installing condas m2-base package via conda install m2-base.

The DataLad installer also supports setting up a Conda environment, in case a suitable Python version is already available.

3.1.7. Using Python’s package manager pip

As mentioned above, DataLad can be installed via Python’s package manager pip. pip comes with any Python distribution from python.org, and is available as a system-package in nearly all GNU/Linux distributions.

If you have Python and pip set up, to automatically install DataLad and most of its software dependencies, type

$ pip install datalad

If this results in a permission denied error, you can install DataLad into a user’s home directory:

$ pip install --user datalad

On some systems, in particular macOS, you may need to call pip3 instead of pip:

$ pip3 install datalad
# or, in case of a "permission denied error":
$ pip3 install --user datalad

An existing installation can be upgraded with pip install -U datalad.

pip is not able to install non-Python software, such as 7-zip or git-annex. But you can install the DataLad installer via a pip install datalad-installer. This is a command-line tool that aids installation of DataLad and its key software dependencies on a range of platforms.

3.2. Initial configuration

Initial configurations only concern the setup of a Git identity. If you are a Git-user, you should hence be good to go.

../_images/gitidentity.svg

If you have not used the version control system Git before, you will need to tell Git some information about you. This needs to be done only once. In the following example, exchange Bob McBobFace with your own name, and bob@example.com with your own email address.

# enter your home directory using the ~ shortcut
% cd ~
% git config --global --add user.name "Bob McBobFace"
% git config --global --add user.email bob@example.com

This information is used to track changes in the DataLad projects you will be working on. Based on this information, changes you make are associated with your name and email address, and you should use a real email address and name – it does not establish a lot of trust nor is it helpful after a few years if your history, especially in a collaborative project, shows that changes were made by Anonymous with the email youdontgetmy@email.fu. And do not worry, you won’t get any emails from Git or DataLad.