Installation and configuration

Note

The handbook is written for DataLad version 0.12. If you already have DataLad installed but are unsure whether it is the correct version, you can get information on your version of DataLad by typing datalad --version into your terminal.

Install DataLad

The content in this chapter is largely based on the information given on the DataLad website and the DataLad documentation.

Beyond DataLad itself, the installation requires Python, Git, git-annex, and potentially Pythons package manager pip. The instructions below detail how to install each of these components for different common operating systems. Please file an issue if you encounter problems.

Note that while these installation instructions will provide you with the core DataLad tool, many extensions exist, and they need to be installed separately, if needed.

../_images/install.svg

Linux: (Neuro)Debian, Ubuntu, and similar systems

For Debian-based operating systems, the most convenient installation method is to enable the NeuroDebian repository. If you are on a Debian-based system, but do not have the NeuroDebian repository enabled, you should very much consider enabling it right now. The above hyperlink links to a very easy instruction, and it only requires copy-pasting three lines of code. Also, should you be confused by the name: enabling this repository will not do any harm if your field is not neuroscience.

The following command installs DataLad and all of its software dependencies (including the git-annex-standalone package):

$ sudo apt-get install datalad

Linux-machines with no root access (e.g. HPC systems)

If you want to install DataLad on a machine you do not have root access to, DataLad can be installed with Miniconda.

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
# acknowledge license, keep everything at default
$ conda install -c conda-forge datalad

This should install Git, git-annex, and DataLad. The installer automatically configures the shell to make conda-installed tools accessible, so no further configuration is necessary.

macOS/OSX

A common way to install packages on OS X is via the homebrew package manager. First, install the homebrew package manager. Note that prior to the installation, Xcode needs to be installed from the Mac App Store. Homebrew then can be installed using the command following the instructions on their webpage (linked above).

Next, install git-annex. The easiest way to do this is via brew:

$ brew install git-annex

Once git-annex is available, DataLad can be installed via Pythons package manager pip as described below. pip should already be installed by default. Recent macOS versions may have pip3 instead of pip – use tab completion to find out which is installed. If it is pip3, run:

$ pip3 install datalad~=0.12

instead of the code snippets in the section below.

If this results in a permission denied error, install DataLad into a user’s home directory:

$ pip3 install --user datalad~=0.12

Find out more: If something is not on PATH…

Recent macOS versions may warn after installation that scripts were installed into locations that were not on PATH:

The script chardetect is installed in '/Users/awagner/Library/Python/3.7/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.

To fix this, add these paths to the $PATH environment variable. You can either do this for your own user (1), or for all users of the computer (2) (requires using sudo and authenticating with your computer’s password):

  1. Add something like (exchange the user name accordingly)

export PATH=$PATH:/Users/awagner/Library/Python/3.7/bin

to the profile file of your shell. If you use a bash shell, this may be ~/.bashrc or ~/.bash_profile, if you are using a zsh shell, it may be ~/.zshrc or ~/.zprofile. Find out which shell you are using by typing echo $SHELL into your terminal.

(2) Alternatively, configure it system-wide, i.e., for all users of your computer by adding the the path /Users/awagner/Library/Python/3.7/bin to the file /etc/paths, e.g., with the editor nano:

sudo nano /etc/paths

The contents of this file could look like this afterwards (the last line was added):

/usr/local/bin
/usr/bin
/bin
/usr/sbin
/sbin
/Users/awagner/Library/Python/3.7/bin

Using Pythons package manager pip

DataLad can be installed via Pythons package manager pip. pip comes with Python distributions, e.g., the Python distributions downloaded from python.org. When downloading Python, make sure to chose a recent Python 3 distribution.

If you have Python and pip set up, to automatically install DataLad and its software dependencies, type

$ pip install datalad~=0.12

If this results in a permission denied error, install DataLad into a user’s home directory:

$ pip install --user datalad~=0.12

In addition, it is necessary to have a current version of git-annex installed which is not set up automatically by using the pip method. You can find detailed installation instructions on how to do this here.

For Windows, extract the provided EXE installer into an existing Git installation directory (e.g. C:\\Program Files\Git). If done this way, no PATH variable manipulation is necessary.

Windows 10

There are two ways to get DataLad on Windows 10: one is within Windows itself, the other is using WSL, the Windows Subsystem for Linux.

Note: Using Windows itself comes with some downsides. In general, DataLad can feel a bit sluggish on Windows systems. This is because of a range of filesystem issues that also affect the version control system Git itself, which DataLad relies on. The core functionality of DataLad works, and you should be able to follow the contents covered in this book. You will notice, however, that some Unix commands displayed in examples may not work, and that terminal output can look different from what is displayed in the code examples of the book. If you are a Windows user and want to help improve the handbook for Windows users, please get in touch.

1) Install within Windows [RECOMMENDED]

Note: This installation method will get you a working version of DataLad, but be aware that many Unix commands shown in the book examples will not work for you, and DataLad-related output might look different from what we can show in this book. Please get in touch touch if you want to help.

  • Step 1: Install Conda

    • Go to https://docs.conda.io/en/latest/miniconda.html and pick the latest Python 3 installer. Miniconda is a free, minimal installer for conda and will install conda, Python, depending packages, and a number of useful packages such as pip.

    • During installation, keep everything on default. In particular, do not add anything to PATH.

    • From now on, any further action must take place in the Anaconda prompt, a preconfigured terminal shell. Find it by searching for “Anaconda prompt” in your search bar.

  • Step 2: Install Git

    • In the Anaconda prompt, run:

      conda install -c conda-forge git
      

      Note: Is has to be from conda-forge, the anaconda version does not provide the cp command.

  • Step 3: Install git-annex

    • Obtain the current git-annex versions installer from here. Save the file, and double click the downloaded git-annex-installer.exe in your Downloads.

    • During installation, you will be prompted to “Choose Install Location”. Install it into the miniconda Library directory, e.g. C:\Users\me\Miniconda3\Library.

  • Step 4: Install DataLad via pip

    • pip was installed by miniconda. In the Anaconda prompt, run:

      pip install datalad~=0.12
      

2) Install within WSL

The Windows Subsystem for Linux (WSL) allows Windows users to have full access to a Linux distribution within Windows. If you have always used Windows be prepared for some user experience changes when using Linux compared to Windows. For one, there will be no graphical user interface (GUI). Instead, you will work inside a terminal window. This however mirrors the examples and code snippets provided in this handbook exactly. Using a proper Linux installation improves the DataLad handbook experience on Windows greatly. However, it comes with the downside of two filesystems that are somewhat separated. Data access to files within Linux from within Windows is problematic: Note that there will be incompatibilities between the Windows and Linux filesystems. Files that are created within the WSL for example can not be modified with Windows tools. A great resource to get started and understand the WSL is this guide.

Requirements:

WSL can be enabled for 64-bit versions of Windows 10 systems running Version 1607 or above. To check whether your computer fulfills these requirements, open Settings (in the start menu) > System > About. If your version number is less than 1607, you will need to perform a windows update before installing WSL.

The instructions below show you how to set up the WSL and configure it to use DataLad and its dependencies. They follow the Microsoft Documentation on the Windows Subsystem for Linux. If you run into troubles during the installation, please consult the WSL troubleshooting page.

  • Step 1: Enable the windows subsystem for Linux

    • Open Windows Power Shell as an Administrator and run

    $ Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux
    
    • Afterwards, when prompted in the Power Shell, restart your computer

  • Step 2: Install a Debian Linux distribution

    • To do this, visit the Microsoft store, and search for the Debian distro. We strongly recommend installing Debian, even though other distributions are available. “Get” the app, and “install” it.

  • Step 3: Initialize the distribution

    • Launch the Subsystem either from the Microsoft store or from the Start menu. This will start a terminal. Do not worry – there is a dedicated section (General prerequisites) on how to work with the terminal if you have not so far.

    • Upon first start, you will be prompted to enter a new UNIX username and password. Tip: chose a short name, and no spaces or special characters. The password will become necessary when you elevate a process using sudo – sudo let’s you execute a process with rights of another user, such as administrative rights, for examples when you need to install software.

    • Right after initial installation, your Linux distribution will be minimally equipped. Update your package catalog and upgrade your installed packages by running the command below. As with all code examples in this book, make sure to copy commands exactly, including capitalization. If this is the first time you use sudo, your system will warn you to use it with care. During upgrading installed packages, the terminal will ask you to confirm upgrades by pressing Enter.

    $ sudo apt update && sudo apt upgrade
    
  • Step 4: Enable NeuroDebian

    • In your terminal, run

    $ wget -O- http://neuro.debian.net/lists/stretch.de-md.libre | sudo tee /etc/apt/sources.list.d/neurodebian.sources.list
    
    • Afterwards, run

    $ curl -sL "http://keyserver.ubuntu.com/pks/lookup?op=get&search=0xA5D32F012649A5A9" | sudo apt-key add
    
    • lastly do another

    $ sudo apt-update && sudo apt upgrade
    
  • Step 4: Install datalad and everything it needs

    $ sudo apt install datalad
    

3) Install within WSL2

The Windows Subsystem for Linux (WSL) allows Windows users to have full access to a Linux distribution within Windows. The Windows Subsystem for Linux 2 (WSL2) is the (currently pre-released) update to the WSL. If you have always used Windows be prepared for some user experience changes when using Linux compared to Windows. For one, there will be no graphical user interface (GUI). Instead, you will work inside a terminal window. This however mirrors the examples and code snippets provided in this handbook exactly. Using a proper Linux installation improves the DataLad handbook experience on Windows greatly. However, it comes with the downside of two filesystems that are somewhat separated. Data access to files within Linux from within Windows is problematic: Note that there will be incompatibilities between the Windows and Linux filesystems. Files that are created within the WSL for example can not be modified with Windows tools. A great resource to get started and understand the WSL is this guide.

Requirements:

WSL can be enabled for 64-bit versions of Windows 10 systems running Windows 10 Insider Preview Build 18917 or higher. You can find out how to enter the Windows Insider Program to get access to the prebuilds here. To check whether your computer fulfills these requirements, open Settings (in the start menu) > System > About. Your version number should be at least 1903. Furthermore, your computer needs to support Hyper-V Virtualization.

The instructions below show you how to set up the WSL and configure it to use DataLad and its dependencies. They follow the Microsoft Documentation on the Windows Subsystem for Linux. If you run into troubles during the installation, please consult the WSL troubleshooting page.

  • Step 1: Enable the windows subsystem for Linux.

    • Start the Power Shell as an administrator. Run both commands below, only restart after the second one (despite being prompted after the first one already):

      Enable-WindowsOptionalFeature -Online -FeatureName VirtualMachinePlatform
      Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux
      
  • Step 2: Install a Debian Linux distribution

    • To do this, visit the Microsoft store, and search for the Debian distro. We strongly recommend installing Debian, even though other distributions are available. “Get” the app, and “install” it.

  • Step 3: Initialize the distribution

    • Launch the Subsystem either from the Microsoft store or from the Start menu. This will start a terminal. Do not worry – there is a dedicated section (General prerequisites) on how to work with the terminal if you haven’t so far.

    • Upon first start, you will be prompted to enter a new UNIX username and password. Tip: chose a short name, and no spaces or special characters. The password will become necessary when you elevate a process using sudo – sudo let’s you execute a process with rights of another user, such as administrative rights, for examples when you need to install software.

  • Step 4: Configure the WLS

    • Start the Power Shell as an administrator. To set the WSL version to WSL2, run wsl --set-default-version 2. Configure the distro to use WSL2 by running wsl -l -v. This should give an output like this:

          NAME        STATE               VERSION
      *   Debian       Running            2
      
  • Step 5: Enable NeuroDebian

    • In the terminal of your distribution, run

    $ wget -O- http://neuro.debian.net/lists/stretch.de-md.libre | sudo tee /etc/apt/sources.list.d/neurodebian.sources.list
    
    • Afterwards, run

    $ curl -sL "http://keyserver.ubuntu.com/pks/lookup?op=get&search=0xA5D32F012649A5A9" | sudo apt-key add
    
    • lastly do another

    $ sudo apt-update && sudo apt upgrade
    
  • Step 6: Install datalad and everything it needs from NeuroDebian

    $ sudo apt install datalad
    

Todo

  • maybe update Step 6 to use pip3 to install DataLad and git-annex.

Initial configuration

Initial configurations only concern the setup of a Git identity. If you are a Git-user, you should hence be good to go.

../_images/gitidentity.svg

If you have not used the version control system Git before, you will need to tell Git some information about you. This needs to be done only once. In the following example, exchange Bob McBobFace with your own name, and bob@example.com with your own email address.

# enter your home directory using the ~ shortcut
% cd ~
% git config --global --add user.name "Bob McBobFace"
% git config --global --add user.email bob@example.com

This information is used to track changes in the DataLad projects you will be working on. Based on this information, changes you make are associated with your name and email address, and you should use a real email address and name – it does not establish a lot of trust nor is it helpful after a few years if your history, especially in a collaborative project, shows that changes were made by Anonymous with the email youdontgetmy@email.fu. And do not worry, you won’t get any emails from Git or DataLad.