9.4. How to get help

All DataLad errors or problems you encounter during DataLad-101 are intentional and serve illustrative purposes. But what if you run into any DataLad errors outside of this course? Fortunately, the syllabus has a whole section on that, and on one lazy, warm summer-afternoon you flip through it.

../_images/reading.svg

You realize that you already know the most important things: The number one advice on how to get help is “Read the error message.”. The second advice it “I’m not kidding: Read the error message. The third advice, finally, says “Honestly, read the f***ing error message.

9.4.1. Help yourself

If you run into a DataLad problem and you have followed the three steps above, but the error message does not give you a clue on how to proceed, the first you should do is

  1. find out which version of DataLad you use

  2. read the help page of the command that failed

The first step is important in order to find out whether a command failed due to using a wrong DataLad version. In order to use this book and follow along, your DataLad version should be datalad-0.12 or higher, for example.

To find out which version you are using, run

$ datalad --version
datalad 0.14.0rc1.dev10

If you want a comprehensive overview of your full setup, datalad wtf1 is the command to turn to (datalad-wtf manual). Running this command will generate a report about the DataLad installation and configuration. The output below shows an excerpt.

$ datalad wtf
# WTF
## configuration <SENSITIVE, report disabled by configuration>
## credentials
  - keyring:
    - active_backends:
      - PlaintextKeyring with no encyption v.1.0 at /home/me/.local/share/python_keyring/keyring_pass.cfg
    - config_file: /home/me/.config/python_keyring/keyringrc.cfg
    - data_root: /home/me/.local/share/python_keyring
## datalad 
  - full_version: 0.14.0rc1.dev10-g1b221
  - version: 0.14.0rc1.dev10
## dataset 
  - id: c8774a5b-c857-4751-bce4-e58c6be73349
  - metadata: <SENSITIVE, report disabled by configuration>
  - path: /home/me/dl-101/DataLad-101
  - repo: AnnexRepo
## dependencies 
  - annexremote: 1.4.5
  - appdirs: 1.4.4

This lengthy output will report all information that might be relevant – from DataLad to git-annex or Python up to your operating system.

The second step, finding and reading the help page of the command in question, is important in order to find out how the command that failed is used. Are arguments specified correctly? Does the help list any caveats?

There are multiple ways to find help on DataLad commands. You could turn to the documentation. Alternatively, to get help right inside the terminal, run any command with the -h/--help option (also shown as an excerpt here):

$ datalad get --help
Usage: datalad get [-h] [-s LABEL] [-d PATH] [-r] [-R LEVELS] [-n]
                   [-D DESCRIPTION] [--reckless [auto|ephemeral|shared-...]]
                   [-J NJOBS]
                   [PATH ...]

Get any dataset content (files/directories/subdatasets).

This command only operates on dataset content. To obtain a new independent
dataset from some source use the CLONE command.

By default this command operates recursively within a dataset, but not
across potential subdatasets, i.e. if a directory is provided, all files in
the directory are obtained. Recursion into subdatasets is supported too. If
enabled, relevant subdatasets are detected and installed in order to
fulfill a request.

Known data locations for each requested file are evaluated and data are
obtained from some available location (according to git-annex configuration
and possibly assigned remote priorities), unless a specific source is
specified.

*Getting subdatasets*

Just as DataLad supports getting file content from more than one location,
the same is supported for subdatasets, including a ranking of individual
sources for prioritization.

The following location candidates are considered. For each candidate a
cost is given in parenthesis, higher values indicate higher cost, and thus
lower priority:

- URL of any configured superdataset remote that is known to have the
  desired submodule commit, with the submodule path appended to it.
  There can be more than one candidate (cost 500).

- In case .GITMODULES contains a relative path instead of a URL,
  the URL of any configured superdataset remote that is known to have the
  desired submodule commit, with this relative path appended to it.
  There can be more than one candidate (cost 500).

- A URL or absolute path recorded in .GITMODULES (cost 600).

- In case .GITMODULES contains a relative path as a URL, the absolute
  path of the superdataset, appended with this relative path (cost 900).

Additional candidate URLs can be generated based on templates specified as
configuration variables with the pattern

  DATALAD.GET.SUBDATASET-SOURCE-CANDIDATE-<NAME>

where NAME is an arbitrary identifier. If NAME starts with three digits
(e.g. '400myserver') these will be interpreted as a cost, and the
respective candidate will be sorted into the generated candidate list
according to this cost. If no cost is given, a default of 700 is used.

A template string assigned to such a variable can utilize the Python format
mini language and may reference a number of properties that are inferred
from the parent dataset's knowledge about the target subdataset. Properties
include any submodule property specified in the respective .GITMODULES

This for example is the help page on datalad get (the same you would find in the documentation, but in your terminal). It contains a command description, a list of all the available options with a short explanation of them, and example commands. The paragraph Options shows all optional flags, and all required parts of the command are listed in the paragraph Arguments. One first thing to check for example is whether your command call specified all of the required arguments.

9.4.2. Asking questions (right)

If nothing you do on your own helps to solve the problem, consider asking others. Check out neurostars and search for your problem – likely, somebody already encountered the same error before and fixed it, but if not, just ask a new question with a datalad tag.

Make sure your question is as informative as it can be for others. Include

  • context – what did you want to do and why?

  • the problem – paste the error message (all of it), and provide the steps necessary to reproduce it.

  • technical details – what version of DataLad are you using, what version of git-annex, and which git-annex repository type, what is your operating system and – if applicable – Python version? datalad wtf is your friend to find all of this information.

The “submit a question link” on DataLad’s GitHub page page prefills a neurostars form with a template for a question for a good starting point if you want to have more guidance or encounter writer’s block.

9.4.3. Debugging like a DataLad-developer

If you have read a command’s help from start to end, checked all software versions twice, even asked colleagues to reproduce your problem (unsuccessfully), and you still don’t have any clue what is going on, then welcome to the debugging section!

../_images/debug.svg

Fig. 9.2 It’s not as bad as this

It is not always straightforward to see why a particular DataLad command has failed. Given that operations with DataLad can be quite complicated, and could involve complexities such as different forms of authentication, different file systems, interactions with the environment, configurations, and other software, and much more, there are what may feel like an infinite amount of sources for the problem at hand. The resulting error message, however, may not display the underlying cause correctly because the error message of whichever process failed is not propagated into the final result report. Thus, you may end up with an uninformative Unable to access these remotes error in the result summary, when the underlying issue is a certificate error.

In situations where there is no obvious reason for a command to fail, it can be helpful – either for yourself or for further information to paste into GitHub issues – to start debugging, or logging at a higher granularity than is the default. This allows you to gain more insights into the actions DataLad and its underlying tools are taking, where exactly they fail, and to even play around with the program at the state of the failure.

Debugging and logging are not as complex as these terms may sound if you have never consciously debugged. Procedurally, it can be as easy as adding an additional flag to a command call, and cognitively, it can be as easy as engaging your visual system in a visual search task for the color red or the word “error”, or reading more DataLad output that you’re used to. The paragraphs below start with the general concepts, and collect concrete debugging strategies for different problems. If you have advice to add, please get in touch.

9.4.3.1. Logging

In order to gain more insights into the steps performed by a program and capture as many details as possible for troubleshooting an error, you can turn to logging. Logging simply refers to the fact that DataLad and its underlying tools tell you what they are doing: This information can be coarse, such as a mere [INFO] Downloading <some_url> into <some_target>, or fine-grained, such as [DEBUG] Resolved dataset for status reporting: <dataset>. The log level in brackets at the beginning of the line indicates how many details DataLad shares with you.

Note that logging is not a sealed book, and happens automatically during the execution of any DataLad command. While you were reading the handbook, you have seen a lot of log messages already. Anything printed to your terminal preceded by [INFO], for example, is a log message (in this case, on the info level). When you are consciously logging, you simply set the log-level to the desired amount of information, or increase the amount of verbosity until the output gives you a hint of what went wrong. Likewise, adjusting the log-level also works the other way around, and lets you decrease the amount of information you receive in your terminal.

Log levels

Log levels provide the means to adjust how much information you want, and are described in human readable terms, ordered by the severity of the failures or problems reported. The following log levels can be chosen from:

  • critical: Only catastrophes are reported. Currently, there is nothing inside of DataLad that would log at this level, so setting the log level to critical will result in getting no details at all, not even about errors or failures.

  • error: With this log level you will receive reports on any errors that occurred within the program during command execution.

  • warning: At this log level, the command execution will report on usual situations and anything that might be a problem, in addition to report anything from the error log level. .

  • info: This log level will include reports by the program that indicate normal behavior and serve to keep you up to date about the current state of things, in additions to warning and error logging messages.

  • debug: This log level is very useful to troubleshoot a problem, and results in DataLad telling you a lot it possibly can.

Other than log levels, you can also adjust the amount of information provided with numerical granularity. Instead of specifying a log level, provide an integer between 1 and 9, with lower values denoting more debugging information.

Raising the log level (e.g, to error, or 9`) will decrease the amount of information and output you will receive, while lowering it (e.g., to ``debug or 1) will increase it.

Setting a log level can be done in the form of an environment variable, a configuration, or with the -l/--log-level flag appended directly after the main datalad command. To get extensive information on what datalad status does underneath the hood, your command could look like this:

$ datalad --log-level debug status
[DEBUG] Command line args 1st pass for DataLad 0.14.0rc1.dev10-g1b221. Parsed: Namespace() Unparsed: ['status']
[DEBUG] Discovering plugins 
[DEBUG] Building doc for <class 'datalad.core.local.status.Status'> 
[DEBUG] Parsing known args among ['/home/adina/env/handbook2/bin/datalad', '--log-level', 'debug', 'status']
[DEBUG] Async run ['git', '--git-dir=', 'config', '-z', '-l', '--show-origin']
[DEBUG] Launching process ['git', '--git-dir=', 'config', '-z', '-l', '--show-origin']
[DEBUG] Process 1207466 started
[DEBUG] Waiting for process 1207466 to complete
[DEBUG] Process 1207466 exited with return code 0
[DEBUG] Determined class of decorated function: <class 'datalad.core.local.status.Status'> 
[DEBUG] Resolved dataset for status reporting: /home/me/dl-101/DataLad-101 
[DEBUG] Async run ['git', 'config', '-z', '-l', '--show-origin']
[DEBUG] Launching process ['git', 'config', '-z', '-l', '--show-origin']
[DEBUG] Process 1207468 started
[DEBUG] Waiting for process 1207468 to complete
[DEBUG] Process 1207468 exited with return code 0
[DEBUG] Async run ['git', 'config', '-z', '-l', '--show-origin', '--file', '/home/me/dl-101/DataLad-101/.datalad/config']
[DEBUG] Launching process ['git', 'config', '-z', '-l', '--show-origin', '--file', '/home/me/dl-101/DataLad-101/.datalad/config']
[DEBUG] Process 1207470 started
[DEBUG] Waiting for process 1207470 to complete
[DEBUG] Process 1207470 exited with return code 0
[DEBUG] query AnnexRepo(/home/me/dl-101/DataLad-101).diffstatus() for paths: None 
[DEBUG] Async run ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}']
[DEBUG] Launching process ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}']
[DEBUG] Process 1207472 started
[DEBUG] Waiting for process 1207472 to complete
[DEBUG] Process 1207472 exited with return code 0
[DEBUG] AnnexRepo(/home/me/dl-101/DataLad-101).get_content_info(...) 
[DEBUG] Query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Async run ['git', 'ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Launching process ['git', 'ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Process 1207474 started
[DEBUG] Waiting for process 1207474 to complete
[DEBUG] Process 1207474 exited with return code 0
[DEBUG] Done query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Done AnnexRepo(/home/me/dl-101/DataLad-101).get_content_info(...) 
[DEBUG] Async run ['git', 'ls-files', '-z', '-m']
[DEBUG] Launching process ['git', 'ls-files', '-z', '-m']
[DEBUG] Process 1207476 started
[DEBUG] Waiting for process 1207476 to complete
[DEBUG] Process 1207476 exited with return code 0
[DEBUG] AnnexRepo(/home/me/dl-101/DataLad-101).get_content_info(...) 
[DEBUG] Query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Async run ['git', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Launching process ['git', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Process 1207478 started
[DEBUG] Waiting for process 1207478 to complete
[DEBUG] Process 1207478 exited with return code 0
[DEBUG] Done query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Initiating a new process for BatchedCommand(cmd=['git', 'cat-file', '--batch'], output_proc=<function>, path='/home/me/dl-101/DataLad-101') 
[DEBUG] Closing stderr of <Popen: returncode: None args: ['git', 'cat-file', '--batch']>
[DEBUG] Closing stdin of <Popen: returncode: None args: ['git', 'cat-file', '--batch']> and waiting process to finish
[DEBUG] Process <Popen: returncode: 0 args: ['git', 'cat-file', '--batch']> has finished
[DEBUG] Done AnnexRepo(/home/me/dl-101/DataLad-101).get_content_info(...) 
[DEBUG] Async run ['git', 'config', '-z', '-l', '--show-origin']
[DEBUG] Launching process ['git', 'config', '-z', '-l', '--show-origin']
[DEBUG] Process 1207481 started
[DEBUG] Waiting for process 1207481 to complete
[DEBUG] Process 1207481 exited with return code 0
[DEBUG] Async run ['git', 'config', '-z', '-l', '--show-origin', '--file', '/home/me/dl-101/DataLad-101/midterm_project/.datalad/config']
[DEBUG] Launching process ['git', 'config', '-z', '-l', '--show-origin', '--file', '/home/me/dl-101/DataLad-101/midterm_project/.datalad/config']
[DEBUG] Process 1207483 started
[DEBUG] Waiting for process 1207483 to complete
[DEBUG] Process 1207483 exited with return code 0
[DEBUG] Async run ['git', 'symbolic-ref', 'HEAD']
[DEBUG] Launching process ['git', 'symbolic-ref', 'HEAD']
[DEBUG] Process 1207485 started
[DEBUG] Waiting for process 1207485 to complete
[DEBUG] Process 1207485 exited with return code 0
[DEBUG] Async run ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}']
[DEBUG] Launching process ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}']
[DEBUG] Process 1207487 started
[DEBUG] Waiting for process 1207487 to complete
[DEBUG] Process 1207487 exited with return code 0
[DEBUG] AnnexRepo(/home/me/dl-101/DataLad-101/midterm_project).get_content_info(...)
[DEBUG] Query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Async run ['git', 'ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Launching process ['git', 'ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Process 1207489 started
[DEBUG] Waiting for process 1207489 to complete
[DEBUG] Process 1207489 exited with return code 0
[DEBUG] Done query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Done AnnexRepo(/home/me/dl-101/DataLad-101/midterm_project).get_content_info(...)
[DEBUG] Async run ['git', 'ls-files', '-z', '-m']
[DEBUG] Launching process ['git', 'ls-files', '-z', '-m']
[DEBUG] Process 1207491 started
[DEBUG] Waiting for process 1207491 to complete
[DEBUG] Process 1207491 exited with return code 0
[DEBUG] AnnexRepo(/home/me/dl-101/DataLad-101/midterm_project).get_content_info(...)
[DEBUG] Query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Async run ['git', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Launching process ['git', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Process 1207493 started
[DEBUG] Waiting for process 1207493 to complete
[DEBUG] Process 1207493 exited with return code 0
[DEBUG] Done query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Done AnnexRepo(/home/me/dl-101/DataLad-101/midterm_project).get_content_info(...)
[DEBUG] Async run ['git', 'config', '-z', '-l', '--show-origin']
[DEBUG] Launching process ['git', 'config', '-z', '-l', '--show-origin']
[DEBUG] Process 1207495 started
[DEBUG] Waiting for process 1207495 to complete
[DEBUG] Process 1207495 exited with return code 0
[DEBUG] Async run ['git', 'config', '-z', '-l', '--show-origin', '--file', '/home/me/dl-101/DataLad-101/midterm_project/input/.datalad/config']
[DEBUG] Launching process ['git', 'config', '-z', '-l', '--show-origin', '--file', '/home/me/dl-101/DataLad-101/midterm_project/input/.datalad/config']
[DEBUG] Process 1207497 started
[DEBUG] Waiting for process 1207497 to complete
[DEBUG] Process 1207497 exited with return code 0
[DEBUG] Async run ['git', 'symbolic-ref', 'HEAD']
[DEBUG] Launching process ['git', 'symbolic-ref', 'HEAD']
[DEBUG] Process 1207499 started
[DEBUG] Waiting for process 1207499 to complete
[DEBUG] Process 1207499 exited with return code 0
[DEBUG] Async run ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}']
[DEBUG] Launching process ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}']
[DEBUG] Process 1207501 started
[DEBUG] Waiting for process 1207501 to complete
[DEBUG] Process 1207501 exited with return code 0
[DEBUG] AnnexRepo(/home/me/dl-101/DataLad-101/midterm_project/input).get_content_info(...)
[DEBUG] Query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Async run ['git', 'ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Launching process ['git', 'ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Process 1207503 started
[DEBUG] Waiting for process 1207503 to complete
[DEBUG] Process 1207503 exited with return code 0
[DEBUG] Done query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Done AnnexRepo(/home/me/dl-101/DataLad-101/midterm_project/input).get_content_info(...)
[DEBUG] Async run ['git', 'ls-files', '-z', '-m']
[DEBUG] Launching process ['git', 'ls-files', '-z', '-m']
[DEBUG] Process 1207505 started
[DEBUG] Waiting for process 1207505 to complete
[DEBUG] Process 1207505 exited with return code 0
[DEBUG] AnnexRepo(/home/me/dl-101/DataLad-101/midterm_project/input).get_content_info(...)
[DEBUG] Query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Async run ['git', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Launching process ['git', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Process 1207507 started
[DEBUG] Waiting for process 1207507 to complete
[DEBUG] Process 1207507 exited with return code 0
[DEBUG] Done query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Done AnnexRepo(/home/me/dl-101/DataLad-101/midterm_project/input).get_content_info(...)
[DEBUG] Async run ['git', 'config', '-z', '-l', '--show-origin']
[DEBUG] Launching process ['git', 'config', '-z', '-l', '--show-origin']
[DEBUG] Process 1207509 started
[DEBUG] Waiting for process 1207509 to complete
[DEBUG] Process 1207509 exited with return code 0
[DEBUG] Async run ['git', 'config', '-z', '-l', '--show-origin', '--file', '/home/me/dl-101/DataLad-101/recordings/longnow/.datalad/config']
[DEBUG] Launching process ['git', 'config', '-z', '-l', '--show-origin', '--file', '/home/me/dl-101/DataLad-101/recordings/longnow/.datalad/config']
[DEBUG] Process 1207511 started
[DEBUG] Waiting for process 1207511 to complete
[DEBUG] Process 1207511 exited with return code 0
[DEBUG] Async run ['git', 'symbolic-ref', 'HEAD']
[DEBUG] Launching process ['git', 'symbolic-ref', 'HEAD']
[DEBUG] Process 1207513 started
[DEBUG] Waiting for process 1207513 to complete
[DEBUG] Process 1207513 exited with return code 0
[DEBUG] Async run ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}']
[DEBUG] Launching process ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}']
[DEBUG] Process 1207515 started
[DEBUG] Waiting for process 1207515 to complete
[DEBUG] Process 1207515 exited with return code 0
[DEBUG] AnnexRepo(/home/me/dl-101/DataLad-101/recordings/longnow).get_content_info(...)
[DEBUG] Query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Async run ['git', 'ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Launching process ['git', 'ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Process 1207517 started
[DEBUG] Waiting for process 1207517 to complete
[DEBUG] Process 1207517 exited with return code 0
[DEBUG] Done query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o', '--directory', '--no-empty-directory']
[DEBUG] Done AnnexRepo(/home/me/dl-101/DataLad-101/recordings/longnow).get_content_info(...)
[DEBUG] Async run ['git', 'ls-files', '-z', '-m']
[DEBUG] Launching process ['git', 'ls-files', '-z', '-m']
[DEBUG] Process 1207519 started
[DEBUG] Waiting for process 1207519 to complete
[DEBUG] Process 1207519 exited with return code 0
[DEBUG] AnnexRepo(/home/me/dl-101/DataLad-101/recordings/longnow).get_content_info(...)
[DEBUG] Query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Async run ['git', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Launching process ['git', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Process 1207521 started
[DEBUG] Waiting for process 1207521 to complete
[DEBUG] Process 1207521 exited with return code 0
[DEBUG] Done query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l']
[DEBUG] Done AnnexRepo(/home/me/dl-101/DataLad-101/recordings/longnow).get_content_info(...)
nothing to save, working tree clean

… and how does it look when using environment variables or configurations?

The log level can also be set (for different scopes) using the datalad.log.level configuration variable, or the corresponding environment variable DATALAD_LOG_LEVEL.

To set the log level for a single command, for example, set it in front of the command:

$ DATALAD_LOG_LEVEL=debug datalad status

And to set the log level for the rest of the shell session, export it:

$ export DATALAD_LOG_LEVEL=debug
$ datalad status
$ ...

You can find out a bit more on environment variable in this footnote.

The configuration variable can be used to set the log level on a user (global) or system-wide level with the git config command:

$ git config --global datalad.log.level debug

This output is extensive and detailed, but it precisely shows the sequence of commands and arguments that are run prior to a failure or crash, and all additional information that is reported with the log levels info or debug can be very helpful to find out what is wrong. Even if the vast amount of detail in output generated with debug logging appears overwhelming, it can make sense to find out at which point an execution stalls, whether arguments, commands, or datasets reported in the debug output are what you expect them to be, and to forward all information into any potential GitHub issue you will be creating.

Finally, other than logging with a DataLad command, it sometimes can be useful to turn to git-annex or Git for logging. For failing datalad get calls, it may be useful to retry the retrieval using git annex get -d -v <file>, where -d (debug) and -v (verbose) increase the amount of detail about the command execution and failure. In rare cases where you suspect something might be wrong with Git, setting the environment variables GIT_TRACE and GIT_TRACE_SETUP to 2 prior to running a Git command will give you debugging output.

9.4.3.2. Debugging

If the additional level of detail provided by logging messages is not enough, you can go further with actual debugging. For this, add the --dbg or --idbg flag to the main datalad command, as in datalad --dbg status. Adding this flag will enter a Python or IPython debugger when something unexpectedly crashes. This allows you to debug the program right when it fails, inspect available variables and their values, or step back and forth through the source code. Note that debugging experience is not a prerequisite when using DataLad – although it is an exciting life skill. The official Python docs provide a good overview on the available debugger commands if you are interested in learning more about this.

9.4.3.3. Debugging examples

This section collects errors and their solutions from real GitHub issues. They may not be applicable for the problem you are currently facing, but seeing other’s troubleshooting strategies may be helpful nevertheless. If you are interested in getting your error and solution described here, please get in touch.

datalad get: It is common for datalad get errors to originate in git-annex, the software used by DataLad to transfer data. Here are a few suggestions to debug them:

  • Take a deep breath, or preferably a walk in a nice park :)

  • Check that you are using a recent version of git-annex
    • git-annex version returns the version of git-annex on the first line of its input, and it is also reported in the output of datalad wtf.

    • The version number contains the release date of the version in use. For instance, git-annex version: 8.20200330-g971791563 was released on 30 March 2020.

    • If the version that you are using is older than a few months, consider updating using the instructions here.

  • Try to download the file using git-annex get -v -d <file_name>. If this doesn’t succeed, the DataLad command may not succeed. Options -d/--debug and -v are here to provide as much verbosity in error messages as possible

  • Read the output of git-annex, identify the error, breathe again, and solve the issue!

Table 9.1 Examples of possible issues

git-annex error

A solution that worked once

Last exception was:
Could not find a suitable TLS CA
certificate bundle, invalid path:
/etc/pki/tls/certs/ca-bundle.crt
[adapters.py:cert_verify:227]

Unset environment variable CURL_CA_BUNDLE

Permission denied when writing file

Download to a writeable file system

File retrieval from an Amazon S3 bucket failed during a system call in a MATLAB session:

>> system('datalad -C mytest \
           get 100206/T1w/T1w_acpc_dc.nii.gz')
  [...]
  git-annex: get: 1 failed

MATLAB massively overrides the LD_LIBRARY_PATH setting. This can lead to a number of issues, among them SSL certification errors. Prefixing the datalad get command with

!LD_LIBRARY_PATH= datalad get [....]

can solve this.

9.4.4. Common warnings and errors

A lot of output you will see while working with DataLad originates from warnings or errors by DataLad, git-annex, or Git. Some of these outputs can be wordy and not trivial to comprehend - and even if everything works, some warnings can be hard to understand. This following section will list some common git-annex warnings and errors and attempts to explain them. If you encounter warnings or errors that you would like to see explained in this book, please let us know by filing an issue.

9.4.4.1. Output produced by Git

Unset Git identity

If you have not configured your Git identity, you will see warnings like this when running any DataLad command:

[WARNING] It is highly recommended to configure git first (set both user.name and user.email) before using DataLad.

To set your Git identity, go back to section Initial configuration.

Rejected pushes

One error you can run into when publishing dataset contents is that your datalad push to a sibling is rejected. One example is this:

$ datalad push --to public
 [ERROR  ] refs/heads/master->public:refs/heads/master [rejected] (non-fast-forward) [publish(/home/me/dl-101/DataLad-101)]

This example is an attempt to push a local dataset to its sibling on GitHub. The push is rejected because it is a non-fast-forward merge situation. Usually, this means that the sibling contains changes that your local dataset does not yet know about. It can be fixed by updating from the sibling first with a datalad update --merge.

Here is a different push rejection:

$ datalad push --to roommate
 publish(ok): . (dataset) [refs/heads/git-annex->roommate:refs/heads/git-annex 023a541..59a6f8d]
 [ERROR  ] refs/heads/master->roommate:refs/heads/master [remote rejected] (branch is currently checked out) [publish(/home/me/dl-101/DataLad-101)]
 publish(error): . (dataset) [refs/heads/master->roommate:refs/heads/master [remote rejected] (branch is currently checked out)]
 action summary:
   publish (error: 1, ok: 1)

As you can see, the git-annex branch was pushed successfully, but updating the master branch was rejected: [remote rejected] (branch is currently checked out) [publish(/home/me/dl-101/DataLad-101)]. In this particular case, this is because it was an attempt to push from DataLad-101 to the roommate sibling that was created in chapter Collaboration. This is a special case of pushing, because it – in technical terms – is a push to a non-bare repository. Unlike bare Git repositories, non-bare repositories can not be pushed to at all times. To fix this, you either want to checkout another branch in the roommate sibling or push to a non-checked out branch in the roommate sibling. Alternatively, you can configure roommate to receive the push with Git’s receive.denyCurrentBranch configuration key. By default, this configuration is set to refuse. Setting it to updateInstead with git config receive.denyCurrentBranch updateInstead will allow updating the checked out branch. See git configs man page entry on receive.denyCurrentBranch for more.

Detached HEADs

One warning that you may encounter during an installation of a dataset is:

[INFO   ] Submodule HEAD got detached. Resetting branch master to point to 046713bb. Original location was 47e53498

Even though “detached HEAD” sounds slightly worrisome, this is merely an information and does not require an action from your side. It is related to Git submodules (the underlying Git concept for subdatasets), and informs you about the current state a subdataset is saved in the superdataset you have just cloned.

9.4.4.2. Output produced by git-annex

Unusable annexes

Upon installation of a dataset, you may see:

[INFO    ]     Remote origin not usable by git-annex; setting annex-ignore
[INFO    ]     This could be a problem with the git-annex installation on the
remote. Please make sure that git-annex-shell is available in PATH when you
ssh into the remote. Once you have fixed the git-annex installation,
run: git annex enableremote origin

This warning lets you know that git-annex will not attempt to download content from the remote “origin”. This can have many reasons, but as long as there are other remotes you can access the data from, you are fine.

A similar warning message may appear when adding a sibling that is a pure Git remote, for example a repository on GitHub:

[INFO ] Failed to enable annex remote github, could be a pure git or not
accessible
[WARNING] Failed to determine if github carries annex. Remote was marked by
annex as annex-ignore. Edit .git/config to reset if you think that was done
by mistake due to absent connection etc

These messages indicate that the sibling github does not carry an annex. Thus, annexed file contents can not be pushed to this sibling. This happens if the sibling indeed does not have an annex (which would be true, for example, for siblings on GitHub, GitLab, Bitbucket, …, and would not require any further action or worry), or if the remote could not be reached, e.g., due to a missing internet connection (in which case you could set the key annex-ignore in .git/config to false).

Speaking of remotes that are not available, this will probably be one of the most commonly occurring git-annex errors to see - failing datalad get calls because remotes are not available:

Todo

paste some “please make these remotes available output”

Todo

If one does not have an SSH key configured, e.g., on a server (from remodnav paper on brainbfast):

[INFO   ] Cloning https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav [2 other candidates] into '/home/homeGlobal/adina/paper-remodnav/remodnav'
Permission denied (publickey).
[WARNING] Failed to run cmd ['ssh', '-fN', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=15m', '-o', 'ControlPath="/home/homeGlobal/adina/.cache/datalad/sockets/6ca483de"', 'git@github.com']. Exit code=255
| stdout: None
| stderr: None
[ERROR  ] Failed to clone from any candidate source URL. Encountered errors per each url were: (OrderedDict([('https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav', "Cmd('/usr/lib/git-annex.linux/git') failed due to: exit code(128)\n  cmdline: /usr/lib/git-annex.linux/git clone --progress -v https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav /home/homeGlobal/adina/paper-remodnav/remodnav [cmd.py:wait:412]"), ('https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav/.git', "Cmd('/usr/lib/git-annex.linux/git') failed due to: exit code(128)\n  cmdline: /usr/lib/git-annex.linux/git clone --progress -v https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav/.git /home/homeGlobal/adina/paper-remodnav/remodnav [cmd.py:wait:412]"), ('git@github.com:psychoinformatics-de/remodnav.git', "Cmd('/usr/lib/git-annex.linux/git') failed due to: exit code(128)\n  cmdline: /usr/lib/git-annex.linux/git clone --progress -v git@github.com:psychoinformatics-de/remodnav.git /home/homeGlobal/adina/paper-remodnav/remodnav [cmd.py:wait:412]")]),) [install(/home/homeGlobal/adina/paper-remodnav/remodnav)]
[ERROR  ] Installation of subdatasets /home/homeGlobal/adina/paper-remodnav/remodnav failed with exception: InstallFailedError:
Failed to install dataset from any of: ['https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav', 'git@github.com:psychoinformatics-de/remodnav.git'] [get.py:_install_subds_from_flexible_source:184] [install(/home/homeGlobal/adina/paper-remodnav/remodnav)]
Traceback (most recent call last):
  File "code/mk_figuresnstats.py", line 811, in <module>
    savefigs(args.figure, args.stats)
  File "code/mk_figuresnstats.py", line 410, in savefigs
    stat)
  File "code/mk_figuresnstats.py", line 274, in confusion
    load_anderson(stimtype, finame)
  File "code/mk_figuresnstats.py", line 28, in load_anderson
    get(fname)
  File "/home/homeGlobal/adina/env/remodnav/lib/python3.5/site-packages/datalad/interface/utils.py", line 492, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/home/homeGlobal/adina/env/remodnav/lib/python3.5/site-packages/datalad/interface/utils.py", line 480, in return_func
    results = list(results)
  File "/home/homeGlobal/adina/env/remodnav/lib/python3.5/site-packages/datalad/interface/utils.py", line 468, in generator_func
    msg="Command did not complete successfully")
datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully [{'type': 'dataset', 'status': 'error', 'action': 'install', 'message': ('Installation of subdatasets %s failed with exception: %s', '/home/homeGlobal/adina/paper-remodnav/remodnav', "InstallFailedError: \nFailed to install dataset from any of: ['https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav', 'git@github.com:psychoinformatics-de/remodnav.git'] [get.py:_install_subds_from_flexible_source:184]"), 'path': '/home/homeGlobal/adina/paper-remodnav/remodnav'}]

Footnotes

1

wtf in datalad wtf could stand for many things. “Why the Face?” “Wow, that’s fantastic!”, “What’s this for?”, “What to fix”, “What the FAQ”, “Where’s the fire?”, “Wipe the floor”, “Welcome to fun”, “Waste Treatment Facility”, “What’s this foolishness”, “What the fruitcake”, … Pick a translation of your choice and make running this command more joyful.