2.4. Clean desk

Just now you realize that you need to fit both logos onto the same slide. “Ah, damn, I might then really need to have them 400 by 400 pixel to fit”, you think. “Good that I know how to not run into the permission denied errors anymore!”

Therefore, we need to do the datalad run command yet again - we wanted to have the image in 400x400 px size. “Now this definitely will be the last time I’m running this”, you think.

$ datalad run -m "Resize logo for slides" \
--input "recordings/longnow/.datalad/feed_metadata/logo_interval.jpg" \
--output "recordings/interval_logo_small.jpg" \
"convert -resize 400x400 recordings/longnow/.datalad/feed_metadata/logo_interval.jpg recordings/interval_logo_small.jpg"
run(impossible): /home/me/dl-101/DataLad-101 (dataset) [clean dataset required to detect changes from command; use `datalad status` to inspect unsaved changes]

2.4.1. Oh for f**** sake… run is “impossible”?

Weird. After the initial annoyance about yet another error message faded, and you read on, DataLad informs that a “clean dataset” is required. Run a datalad status to see what is meant by this:

$ datalad status
 modified: notes.txt (file)

Ah right. We forgot to save the notes we added, and thus there are unsaved modifications present in DataLad-101. But why is this a problem?

By default, at the end of a datalad run is a datalad save. Remember the section Populate a dataset: A general datalad save without a path specification will save all of the modified or untracked contents to the dataset.

Therefore, in order to not mix any changes in the dataset that are unrelated to the command plugged into datalad run, by default it will only run on a clean dataset with no changes or untracked files present.

There are two ways to get around this error message: The more obvious – and recommended – one is to save the modifications, and run the command in a clean dataset. We will try this way with the logo_interval.jpg. It would look like this: First, save the changes,

$ datalad save -m "add additional notes on run options"
add(ok): notes.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

and then try again:

$ datalad run -m "Resize logo for slides" \
--input "recordings/longnow/.datalad/feed_metadata/logo_interval.jpg" \
--output "recordings/interval_logo_small.jpg" \
"convert -resize 400x400 recordings/longnow/.datalad/feed_metadata/logo_interval.jpg recordings/interval_logo_small.jpg"
[INFO] Making sure inputs are available (this may take some time)
[INFO] Unlocking files
unlock(ok): recordings/interval_logo_small.jpg (file)
[INFO] Recording unlocked state in git
[INFO] Completed unlocking files
[INFO] == Command start (output follows) =====
[INFO] == Command exit (modification check follows) =====
run(ok): /home/me/dl-101/DataLad-101 (dataset) [convert -resize 400x400 recordings/longn...]
add(ok): recordings/interval_logo_small.jpg (file)
save(ok): . (dataset)

Note how in this execution of datalad run, output unlocking was actually necessary and DataLad provides a summary of this action in its output.

Add a quick addition to your notes about this way of cleaning up prior to a datalad run:

$ cat << EOT >> notes.txt
Important! If the dataset is not "clean" (a datalad status output is
empty), datalad run will not work - you will have to save
modifications present in your dataset.
EOT

A way of executing a datalad run despite an “unclean” dataset, though, is to add the --explicit flag to datalad run. We will try this flag with the remaining logo_salt.jpg. Note that we have an “unclean dataset” again because of the additional note in notes.txt.

$ datalad run -m "Resize logo for slides" \
--input "recordings/longnow/.datalad/feed_metadata/logo_salt.jpg" \
--output "recordings/salt_logo_small.jpg" \
--explicit \
"convert -resize 400x400 recordings/longnow/.datalad/feed_metadata/logo_salt.jpg recordings/salt_logo_small.jpg"
[INFO] Making sure inputs are available (this may take some time)
[INFO] Unlocking files
unlock(ok): recordings/salt_logo_small.jpg (file)
[INFO] Recording unlocked state in git
[INFO] Completed unlocking files
[INFO] == Command start (output follows) =====
[INFO] == Command exit (modification check follows) =====
run(ok): /home/me/dl-101/DataLad-101 (dataset) [convert -resize 400x400 recordings/longn...]
add(ok): recordings/salt_logo_small.jpg (file)
save(ok): . (dataset)

With this flag, DataLad considers the specification of inputs and outputs to be “explicit”. It does not warn if the repository is dirty, but importantly, it only saves modifications to the listed outputs (which is a problem in the vast amount of cases where one does not exactly know which outputs are produced).

Put explicit first!

The --explicit flag has to be given anywhere prior to the command that should be run – the command needs to be the last element of a datalad run call.

A datalad status will show that your previously modified notes.txt is still modified:

$ datalad status
 modified: notes.txt (file)

Add an additional note on the --explicit flag, and finally save your changes to notes.txt.

$ cat << EOT >> notes.txt
A suboptimal alternative is the --explicit flag, used to record only
those changes done to the files listed with --output flags.

EOT
$ datalad save -m "add note on clean datasets"
add(ok): notes.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

To conclude this section on datalad run, take a look at the last datalad run commit to see a run record with more content:

$ git log -p -n 2
Author: Elena Piscopia <elena@example.net>
Date:   Tue Jun 18 16:13:00 2019 +0000

    [DATALAD RUNCMD] Resize logo for slides

    === Do not change lines below ===
    {
     "chain": [],
     "cmd": "convert -resize 400x400 recordings/longnow/.datalad/feed_metadata/logo_salt.jpg recordings/salt_logo_small.jpg",
     "dsid": "e3e70682-c209-4cac-629f-6fbed82c07cd",
     "exit": 0,
     "extra_inputs": [],
     "inputs": [
      "recordings/longnow/.datalad/feed_metadata/logo_salt.jpg"
     ],
     "outputs": [
      "recordings/salt_logo_small.jpg"
     ],
     "pwd": "."
    }
    ^^^ Do not change lines above ^^^

diff --git a/recordings/salt_logo_small.jpg b/recordings/salt_logo_small.jpg
index 0985399..d90c601 120000
--- a/recordings/salt_logo_small.jpg
+++ b/recordings/salt_logo_small.jpg
@@ -1 +1 @@