4. How to get help

All DataLad errors or problems you encounter during DataLad-101 are intentional and serve illustrative purposes. But what if you run into any DataLad errors outside of this course? Fortunately, the syllabus has a whole section on that, and on one lazy, warm summer-afternoon you flip through it.

../_images/reading.svg

You realize that you already know the most important things: The number one advice on how to get help is “Read the error message.”. The second advice it “I’m not kidding: Read the error message. The third advice, finally, says “Honestly, read the f***ing error message.

4.1. Help yourself

If you run into a DataLad problem and you have followed the three steps above, but the error message does not give you a clue on how to proceed, the first you should do is

  1. find out which version of DataLad you use

  2. read the help page of the command that failed

The first step is important in order to find out whether a command failed due to using a wrong DataLad version. In order to use this book and follow along, your DataLad version should be datalad-0.12 or higher, for example.

To find out which version you are using, run

$ datalad --version
datalad 0.12.2.dev166

If you want a comprehensive overview of your full setup, datalad wtf1 is the command to turn to (datalad-wtf manual). Running this command will generate a report about the DataLad installation and configuration. The output below shows an excerpt.

$ datalad wtf
# WTF
## configuration <SENSITIVE, report disabled by configuration>
## datalad 
  - full_version: 0.12.2.dev166-g68f26
  - version: 0.12.2.dev166
## dataset 
  - id: ed80af32-5159-11ea-a727-6533dd7bb2c6
  - metadata: <SENSITIVE, report disabled by configuration>
  - path: /home/me/dl-101/DataLad-101
  - repo: AnnexRepo
## dependencies 
  - appdirs: 1.4.3
  - boto: 2.49.0
  - cmd:7z: 16.02
  - cmd:annex: 7.20190819+git2-g908476a9b-1~ndall+1
  - cmd:bundled-git: 2.20.1
  - cmd:git: 2.20.1
  - cmd:system-git: 2.25.0
  - cmd:system-ssh: 8.1p1

This lengthy output will report all information that might be relevant – from DataLad to git-annex or Python up to your operating system.

The second step, finding and reading the help page of the command in question, is important in order to find out how the command that failed is used. Are arguments specified correctly? Does the help list any caveats?

There are multiple ways to find help on DataLad commands. You could turn to the documentation. Alternatively, to get help right inside the terminal, run any command with the -h/--help option (also shown as an excerpt here):

$ datalad get --help
Usage: datalad get [-h] [-s LABEL] [-d PATH] [-r] [-R LEVELS] [-n]
                   [-D DESCRIPTION] [--reckless [{auto|ephemeral}]] [-J NJOBS]
                   [PATH [PATH ...]]

Get any dataset content (files/directories/subdatasets).

This command only operates on dataset content. To obtain a new independent
dataset from some source use the INSTALL command.

By default this command operates recursively within a dataset, but not
across potential subdatasets, i.e. if a directory is provided, all files in
the directory are obtained. Recursion into subdatasets is supported too. If
enabled, relevant subdatasets are detected and installed in order to
fulfill a request.

Known data locations for each requested file are evaluated and data are
obtained from some available location (according to git-annex configuration
and possibly assigned remote priorities), unless a specific source is
specified.

NOTE
  Power-user info: This command uses git annex get to fulfill
  file handles.

*Examples*

Get a single file::

   % datalad get <path/to/file>

Get contents of a directory::

   % datalad get <path/to/dir/>

Get all contents of the current dataset and its subdatasets::

   % datalad get . --recursive

Get (clone) a registered subdataset, but don't retrieve data::

   % datalad get -n <path/to/subds>

*Arguments*
  PATH                  path/name of the requested dataset component. The
                        component must already be known to a dataset. To add
                        new components to a dataset use the ADD command.
                        Constraints: value must be a string

*Options*
  -h, --help, --help-np
                        show this help message. --help-np forcefully disables
                        the use of a pager for displaying the help message
  -s LABEL, --source LABEL
                        label of the data source to be used to fulfill
                        requests. This can be the name of a dataset sibling or
                        another known source. Constraints: value must be a
                        string
  -d PATH, --dataset PATH
                        specify the dataset to perform the add operation on,

This for example is the help page on datalad get (the same you would find in the documentation, but in your terminal). It contains a command description, a list of all the available options with a short explanation of them, and example commands. The paragraph Options shows all optional flags, and all required parts of the command are listed in the paragraph Arguments. One first thing to check for example is whether your command call specified all of the required arguments.

4.2. Asking questions (right)

If nothing you do on your own helps to solve the problem, consider asking others. Check out neurostars and search for your problem – likely, somebody already encountered the same error before and fixed it, but if not, just ask a new question with a datalad tag.

Make sure your question is as informative as it can be for others. Include

  • context – what did you want to do and why?

  • the problem – paste the error message (all of it), and provide the steps necessary to reproduce it.

  • technical details – what version of DataLad are you using, what version of git-annex, and which git-annex repository type, what is your operating system and – if applicable – Python version? datalad wtf is your friend to find all of this information.

The “submit a question link” on DataLad’s GitHub page page prefills a neurostars form with a template for a question for a good starting point if you want to have more guidance or encounter writer’s block.

4.3. Common warnings and errors

A lot of output you will see while working with DataLad originates from warnings or errors by DataLad, git-annex, or Git. Some of these outputs can be wordy and not trivial to comprehend - and even if everything works, some warnings can be hard to understand. This following section will list some common git-annex warnings and errors and attempts to explain them. If you encounter warnings or errors that you would like to see explained in this book, please let us know by filing an issue.

4.3.1. Output produced by Git

Unset Git identity

If you have not configured your Git identity, you will see warnings like this when running any DataLad command:

[WARNING] It is highly recommended to configure git first (set both user.name and user.email) before using DataLad.

To set your Git identity, go back to section Initial configuration.

Rejected pushes

One error you can run into when publishing dataset contents is that your datalad push to a sibling is rejected. One example is this:

$ datalad push --to public
 [ERROR  ] refs/heads/master->public:refs/heads/master [rejected] (non-fast-forward) [publish(/home/me/dl-101/DataLad-101)]

This example is an attempt to push a local dataset to its sibling on GitHub. The push is rejected because it is a non-fast-forward merge situation. Usually, this means that the sibling contains changes that your local dataset does not yet know about. It can be fixed by updating from the sibling first with a datalad update --merge.

Here is a different push rejection:

$ datalad push --to roommate
 publish(ok): . (dataset) [refs/heads/git-annex->roommate:refs/heads/git-annex 023a541..59a6f8d]
 [ERROR  ] refs/heads/master->roommate:refs/heads/master [remote rejected] (branch is currently checked out) [publish(/home/me/dl-101/DataLad-101)]
 publish(error): . (dataset) [refs/heads/master->roommate:refs/heads/master [remote rejected] (branch is currently checked out)]
 action summary:
   publish (error: 1, ok: 1)

As you can see, the git-annex branch was pushed successfully, but updating the master branch was rejected: [remote rejected] (branch is currently checked out) [publish(/home/me/dl-101/DataLad-101)]. In this particular case, this is because it was an attempt to push from DataLad-101 to the roommate sibling that was created in chapter IV - Collaboration. This is a special case of pushing, because it - in technical terms - is a push to a non-bare repository. Unlike bare Git repositories, non-bare repositories can not be pushed to at all times. To fix this, you either want to checkout another branch in the roommate sibling or push to a non-checked out branch in the roommate sibling.

Detached HEADs

One warning that you may encounter during an installation of a dataset is:

[INFO   ] Submodule HEAD got detached. Resetting branch master to point to 046713bb. Original location was 47e53498

Even though “detached HEAD” sounds slightly worrisome, this is merely an information and does not require an action from your side. It is related to Git submodules (the underlying Git concept for subdatasets), and informs you about the current state a subdataset is saved in the superdataset you have just cloned.

4.3.2. Output produced by git-annex

Unusable annexes

Upon installation of a dataset, you may see:

[INFO    ]     Remote origin not usable by git-annex; setting annex-ignore
[INFO    ]     This could be a problem with the git-annex installation on the
remote. Please make sure that git-annex-shell is available in PATH when you
ssh into the remote. Once you have fixed the git-annex installation,
run: git annex enableremote origin

This warning lets you know that git-annex will not attempt to download content from the remote “origin”. This can have many reasons, but as long as there are other remotes you can access the data from, you are fine.

A similar warning message may appear when adding a sibling that is a pure Git remote, for example a repository on GitHub:

[INFO ] Failed to enable annex remote github, could be a pure git or not
accessible
[WARNING] Failed to determine if github carries annex. Remote was marked by
annex as annex-ignore. Edit .git/config to reset if you think that was done
by mistake due to absent connection etc

These messages indicate that the sibling github does not carry an annex. Thus, annexed file contents can not be pushed to this sibling. This happens if the sibling indeed does not have an annex (which would be true, for example, for siblings on GitHub, GitLab, Bitbucket, …, and would not require any further action or worry), or if the remote could not be reached, e.g., due to a missing internet connection (in which case you could set the key annex-ignore in .git/config to false).

Speaking of remotes that are not available, this will probably be one of the most commonly occurring git-annex errors to see - failing datalad get calls because remotes are not available:

Todo

paste some “please make these remotes available output”

Todo

If one does not have an SSH key configured, e.g., on a server (from remodnav paper on brainbfast):

[INFO   ] Cloning https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav [2 other candidates] into '/home/homeGlobal/adina/paper-remodnav/remodnav'
Permission denied (publickey).
[WARNING] Failed to run cmd ['ssh', '-fN', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=15m', '-o', 'ControlPath="/home/homeGlobal/adina/.cache/datalad/sockets/6ca483de"', 'git@github.com']. Exit code=255
| stdout: None
| stderr: None
[ERROR  ] Failed to clone from any candidate source URL. Encountered errors per each url were: (OrderedDict([('https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav', "Cmd('/usr/lib/git-annex.linux/git') failed due to: exit code(128)\n  cmdline: /usr/lib/git-annex.linux/git clone --progress -v https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav /home/homeGlobal/adina/paper-remodnav/remodnav [cmd.py:wait:412]"), ('https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav/.git', "Cmd('/usr/lib/git-annex.linux/git') failed due to: exit code(128)\n  cmdline: /usr/lib/git-annex.linux/git clone --progress -v https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav/.git /home/homeGlobal/adina/paper-remodnav/remodnav [cmd.py:wait:412]"), ('git@github.com:psychoinformatics-de/remodnav.git', "Cmd('/usr/lib/git-annex.linux/git') failed due to: exit code(128)\n  cmdline: /usr/lib/git-annex.linux/git clone --progress -v git@github.com:psychoinformatics-de/remodnav.git /home/homeGlobal/adina/paper-remodnav/remodnav [cmd.py:wait:412]")]),) [install(/home/homeGlobal/adina/paper-remodnav/remodnav)]
[ERROR  ] Installation of subdatasets /home/homeGlobal/adina/paper-remodnav/remodnav failed with exception: InstallFailedError:
Failed to install dataset from any of: ['https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav', 'git@github.com:psychoinformatics-de/remodnav.git'] [get.py:_install_subds_from_flexible_source:184] [install(/home/homeGlobal/adina/paper-remodnav/remodnav)]
Traceback (most recent call last):
  File "code/mk_figuresnstats.py", line 811, in <module>
    savefigs(args.figure, args.stats)
  File "code/mk_figuresnstats.py", line 410, in savefigs
    stat)
  File "code/mk_figuresnstats.py", line 274, in confusion
    load_anderson(stimtype, finame)
  File "code/mk_figuresnstats.py", line 28, in load_anderson
    get(fname)
  File "/home/homeGlobal/adina/env/remodnav/lib/python3.5/site-packages/datalad/interface/utils.py", line 492, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/home/homeGlobal/adina/env/remodnav/lib/python3.5/site-packages/datalad/interface/utils.py", line 480, in return_func
    results = list(results)
  File "/home/homeGlobal/adina/env/remodnav/lib/python3.5/site-packages/datalad/interface/utils.py", line 468, in generator_func
    msg="Command did not complete successfully")
datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully [{'type': 'dataset', 'status': 'error', 'action': 'install', 'message': ('Installation of subdatasets %s failed with exception: %s', '/home/homeGlobal/adina/paper-remodnav/remodnav', "InstallFailedError: \nFailed to install dataset from any of: ['https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav', 'git@github.com:psychoinformatics-de/remodnav.git'] [get.py:_install_subds_from_flexible_source:184]"), 'path': '/home/homeGlobal/adina/paper-remodnav/remodnav'}]

Footnotes

1

wtf in datalad wtf could stand for many things. “Why the Face?” “Wow, that’s fantastic!”, “What’s this for?”, “What to fix”, “What the FAQ”, “Where’s the fire?”, “Wipe the floor”, “Welcome to fun”, “Waste Treatment Facility”, “What’s this foolishness”, “What the fruitcake”, … Pick a translation of your choice and make running this command more joyful.