3. Installation and configuration¶
3.1. Install DataLad¶
Feedback on installation instructions
The installation methods presented in this chapter are based on experience and have been tested carefully. However, operating systems and other software are continuously evolving, and these guides might have become outdated. Be sure to check out the online-handbook for up-to-date information.
In general, the DataLad installation requires Python 3 (see the Find-out-more on the difference between Python 2 and 3 to learn why this is required), Git, and git-annex, and for some functionality 7-Zip. The instructions below detail how to install the core DataLad tool and its dependencies on common operating systems. They do not cover the various DataLad extensions that need to be installed separately, if desired.
Python 2, Python 3, what’s the difference?
DataLad requires Python 3.8, or a more recent version, to be installed on
your system. The easiest way to verify that this is the case is to open a
terminal and type python
to start a Python session:
$ python
Python 3.9.1+ (default, Jan 20 2021, 14:49:22)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
If this fails, or reports a Python version with a leading 2
, such as
Python 2.7.18
, try starting python3
, which some systems use
to disambiguate between Python 2 and Python 3. If this fails, too, you need
to obtain a recent release of Python 3. On Windows, attempting to run
commands that are not installed might cause a Windows Store window to pop
up. If this happens, Python may not yet be installed. Please check the
Windows 10 and 11 installation instructions, and do not install Python via the
Windows Store.
Python 2 is an outdated, in technical terms “deprecated”, version of Python. Although it still exist as the default Python version on many systems, it is no longer maintained since 2020, and thus, most software has dropped support for Python 2. If you only run Python 2 on your system, most Python software, including DataLad, will be incompatible, and hence unusable, resulting in errors during installation and execution.
But does that mean that you should uninstall Python 2? No! Keep it installed, especially if you are using Linux or macOS. Python 2 existed for 20 years and numerous software has been written for it. It is quite likely that some basic operating system components or legacy software on your computer is depending on it, and uninstalling a preinstalled Python 2 from your system will likely render it unusable. Install Python 3, and have both versions coexist peacefully.
The following sections provide targeted installation instructions for a set of common scenarios, operating systems, or platforms.
3.1.1. Windows 10 and 11¶
There are countless ways to install software on Windows. Here we describe one possible approach that should work on any Windows computer, like one that you may have just bought.
- Python:
Windows itself does not ship with Python, it must be installed separately. If you already did that, please check the Find-out-more on Python versions, if it matches the requirements. Otherwise, head over to the download section of the Python website, and download an installer. Unless you have specific requirements, go with the 64bit installer of the latest Python 3 release.
Avoid installing Python from the Windows store
We recommend to not install Python via the Windows store, even if it opens after you typed
python
, as this version requires additional configurations by hand (in particular of your$PATH
environment variable).When you run the installer, make sure to select the Add Python to PATH option, as this is required for subsequent installation steps and interactive use later on. Other than that, using the default installation settings is just fine.
Verify Python installation
It is not uncommon for multiple Python installations to co-exist on a Windows machine, because particular applications can ship their own. Such alternative installations may even be or become the default. This can cause confusing behavior, because each Python installation will have different package versions installed.
To verify if there are multiple installations, open the windows command line
cmd.exe
and runwhere python
. This will list all variants ofpython.exe
. There will be one inWindowsApps
, which is only a link to the Windows app store. Make sure the Python version that you installed is listed too.If there are multiple Python installation, you can tell which one is default by running this command in
cmd.exe
:> python -c "import sys; print(sys.executable)"
This will print the path of the default
python.exe
. If the output is not matching the expected Python installation, likely the$PATH
environment variable needs to be adjusted. This can be done in the Windows system properties. It is sufficient to move the entries created by the Python installer to the start of the declaration list.- Git:
Windows also does not come with Git. If you happen to have it installed already, please check if you have configured it for command line use. You should be able to open the Windows command prompt and run a command like
git --version
. It should return a version number and not an error.To install Git, visit the Git website and download an installer. If in doubt, go with the 64bit installer of the latest version. The installer itself provides various customization options. We recommend to leave the defaults as they are, in particular the target directory, but configure the following settings (they are distributed over multiple dialogs):
Select Git from the command line and also from 3rd-party software
Enable file system caching
Select Use external OpenSSH
Enable symbolic links
- Git-annex:
There are two convenient ways to install git-annex. The first is downloading the installer from git-annex’ homepage. The other is to deploy git-annex via the DataLad installer. The latter option requires the installation of the
datalad-installer
Python package. Once Python is available, it can be done with the Python package managerpip
. Open a command prompt and run:> python -m pip install datalad-installer
Afterwards, open another command prompt in administrator mode and run:
> datalad-installer git-annex -m datalad/git-annex:release
This will download a recent git-annex, and configure it for your Git installation. The admin command prompt can be closed afterwards, all other steps do not need it.
For performance improvements, regardless of which installation method you chose, we recommend to also set the following git-annex configuration:
> git config --global filter.annex.process "git-annex filter-process"
- DataLad:
With Python, Git, and git-annex installed, DataLad can be installed, and later also upgraded using
pip
by running:> python -m pip install datalad
- 7-Zip (optional, but highly recommended):
Download it from the 7-zip website (64bit installer when in doubt), and install it into the default target directory.
There are many other ways to install DataLad on Windows, check for example the Windows-wit on the Windows Subsystem 2 for Linux. One attractive alternative approach is Conda, a completely different approach is to install the DataLad Gooey, which is a standalone installation of DataLad’s graphical application (see the DataLad Gooey documentation for installation instructions).
Install DataLad using the Windows Subsystem 2 for Linux
With the Windows Subsystem for Linux, you will be able to use a Unix system despite being on Windows. You need to have a recent build of Windows in order to get WSL2 – we do not recommend WSL1.
You can find out how to install the Windows Subsystem for Linux at docs.microsoft.com. Afterwards, proceed with your installation as described in the installation instructions for Linux.
Using DataLad on Windows has a few peculiarities. In general, DataLad can feel a bit
sluggish on non-WSL2 Windows systems. This is due to various file system issues
that also affect the version control system Git itself, which DataLad
relies on. The core functionality of DataLad works, and you should be able to
follow most contents covered in this book. You will notice, however, that some
Unix commands displayed in examples may not work, and that terminal output can
look different from what is displayed in the code examples of the book, and
that some dependencies for additional functionality are not available for
Windows. Dedicated notes,
“Windows-wit
s”, contain important information, alternative commands, or
warnings, and an overview of useful Windows commands and general information is included in The command line.
3.1.2. Mac (incl. M1)¶
Modern Macs come with a compatible Python 3 version installed by default. The Find-out-more on Python versions has instructions on how to confirm that.
DataLad is available via OS X’s homebrew package manager. First, install the homebrew package manager, which requires Xcode to be installed from the Mac App Store.
Next, install datalad and its dependencies:
$ brew install datalad
Alternatively, you can exclusively use brew
for DataLad’s non-Python
dependencies, and then check the Find-out-more on how to install DataLad via
Python's package manager.
Install DataLad via pip on macOS
If Git/git-annex are installed already (via brew), DataLad can also be
installed via Python’s package manager pip
, which should be installed
by default on your system:
$ python -m pip install datalad
Some macOS versions may use python3
instead of python
– use tab
completion to find out which is installed.
Recent macOS versions may warn after installation that scripts were installed
into locations that were not on PATH
:
The script chardetect is installed in
'/Users/MYUSERNAME/Library/Python/3.11/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to
suppress this warning, use --no-warn-script-location.
To fix this, add these paths to the $PATH
environment variable.
You can do this for your own user account by adding something like the following
to the profile file of your shell (exchange the user name accordingly):
$ export PATH=$PATH:/Users/MYUSERNAME/Library/Python/3.11/bin
If you use a bash shell, this may be ~/.bashrc
or
~/.bash_profile
, if you are using a zsh shell, it may be
~/.zshrc
or ~/.zprofile
. Find out which shell you are using by
typing echo $SHELL
into your terminal.
Alternatively, you could configure it system-wide, i.e., for all users of
your computer by adding the path
/Users/MYUSERNAME/Library/Python/3.11/bin
to the file /etc/paths
,
e.g., with the editor nano (requires using sudo
and authenticating
with your password):
$ sudo nano /etc/paths
The contents of this file could look like this afterwards (the last line was added):
/usr/local/bin
/usr/bin
/bin
/usr/sbin
/sbin
/Users/MYUSERNAME/Library/Python/3.11/bin
3.1.3. Linux: (Neuro)Debian, Ubuntu, and similar systems¶
DataLad is part of the Debian and Ubuntu operating systems. However, the particular DataLad version included in a release may be a bit older (check the versions for Debian and Ubuntu to see which ones are available).
For some recent releases of Debian-based operating systems, NeuroDebian provides more recent DataLad versions (check the availability table). In order to install from NeuroDebian, follow its installation documentation, which only requires copy-pasting three lines into a terminal. Also, should you be confused by the name: enabling this repository will not do any harm if your field is not neuroscience.
Whichever repository you end up using, the following command installs DataLad and all of its software dependencies (including git-annex and p7zip):
$ sudo apt-get install datalad
The command above will also upgrade existing installations to the most recent available version.
3.1.4. Linux: CentOS, Redhat, Fedora, or similar systems¶
For CentOS, Redhat, Fedora, or similar distributions, there is an RPM package for git-annex. A suitable version of Python and Git should come with the operating system, although some servers may run fairly old releases.
DataLad itself can be installed via pip
:
$ python -m pip install datalad
Alternatively, DataLad can be installed together with Git and git-annex via Conda.
3.1.5. Linux-machines with no root access (e.g. HPC systems)¶
The most convenient user-based installation can be achieved via Conda.
3.1.6. Conda¶
Conda is a software distribution available for all major operating systems, and its Miniconda installer offers a convenient way to bootstrap a DataLad installation. Importantly, it does not require admin/root access to a system.
Detailed, platform-specific installation instructions are available in the Conda documentation. In short: download and run the installer, or, from the command line, run
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-<YOUR-OS>-x86_64.sh
$ bash Miniconda3-latest-<YOUR-OS>-x86_64.sh
In the above call, replace <YOUR-OS>
with an identifier for your operating
system, such as “Linux” or “MacOSX”. During the installation, you will need to
accept a license agreement (press Enter to scroll down, and type “yes” and
Enter to accept), confirm the installation into the default directory, and you
should respond “yes” to the prompt “Do you wish the installer to initialize
Miniconda3 by running conda init? [yes|no]”
. Afterwards, you can remove the
installation script by running rm ./Miniconda3-latest-*-x86_64.sh
.
The installer automatically configures the shell to make conda-installed tools
accessible, so no further configuration is necessary. Once Conda is installed,
the DataLad package can be installed from the conda-forge
channel:
$ conda install -c conda-forge datalad
In general, all of DataLad’s software dependencies are automatically installed, too. This makes a conda-based deployment very convenient. A from-scratch DataLad installation on a HPC system, as a normal user, is done in three lines:
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
$ # acknowledge license, keep everything at default
$ conda install -c conda-forge datalad
In case a dependency is not available from Conda (e.g., there is no git-annex package for Windows in Conda), please refer to the platform-specific instructions above.
To update an existing installation with conda, use:
$ conda update -c conda-forge datalad
The DataLad installer also supports setting up a Conda environment, in case a suitable Python version is already available.
3.1.7. Using Python’s package manager pip
¶
As mentioned above, DataLad can be installed via Python’s package manager pip. pip
comes with any Python distribution
from python.org, and is available as a system-package
in nearly all GNU/Linux distributions.
If you have Python and pip
set up, to automatically install DataLad and
most of its software dependencies, type
$ python -m pip install datalad
If this results in a permission denied
error, you can install DataLad into
a user’s home directory:
$ python -m pip install --user datalad
On some systems, you may need to call python3
instead of python
:
$ python3 -m pip install datalad
$ # or, in case of a "permission denied error":
$ python3 -m pip install --user datalad
An existing installation can be upgraded with python -m pip install -U datalad
.
pip
is not able to install non-Python software, such as 7-zip or
git-annex. But you can install the DataLad installer via a python -m pip install datalad-installer
. This is a command-line tool that aids installation
of DataLad and its key software dependencies on a range of platforms.
3.2. Initial configuration¶
Initial configurations only concern the setup of a Git identity. If you are a Git-user, you should hence be good to go.
If you have not used the version control system Git before, you will need to
tell Git some information about you. This needs to be done only once.
In the following example, exchange Bob McBobFace
with your own name, and
bob@example.com
with your own email address.
$ # enter your home directory using the ~ shortcut
$ cd ~
$ git config --global --add user.name "Bob McBobFace"
$ git config --global --add user.email bob@example.com
This information is used to track changes in the DataLad projects you will
be working on. Based on this information, changes you make are associated
with your name and email address, and you should use a real email address
and name – it does not establish a lot of trust nor is it helpful after a few
years if your history, especially in a collaborative project, shows
that changes were made by Anonymous
with the email
youdontgetmy@email.fu
.
And do not worry, you won’t get any emails from Git or DataLad.