Clean desk

Just now you realize that you need to fit both logos onto the same slide. “Ah, damn, I might then really need to have them 400 by 400 pixel to fit”, you think. “Good that I know how to not run into the permission denied errors anymore!”

Therefore, we need to do the datalad run command yet again - we wanted to have the image in 400x400 px size. “Now this definitely will be the last time I’m running this”, you think.

$ datalad run -m "Resize logo for slides" \
--input "recordings/longnow/.datalad/feed_metadata/logo_interval.jpg" \
--output "recordings/interval_logo_small.jpg" \
"convert -resize 400x400 recordings/longnow/.datalad/feed_metadata/logo_interval.jpg recordings/interval_logo_small.jpg"
run(impossible): /home/me/dl-101/DataLad-101 (dataset) [clean dataset required to detect changes from command; use `datalad status` to inspect unsaved changes]

Oh for f**** sake… run is “impossible”?

Weird. After the initial annoyance about yet another error message faded, and you read on, DataLad informs that a “clean dataset” is required. Run a datalad status to see what is meant by this:

$ datalad status
 modified: notes.txt (file)

Ah right. We forgot to save the notes we added, and thus there are unsaved modifications present in DataLad-101. But why is this a problem?

By default, at the end of a datalad run is a datalad save. Remember the section on Populate a dataset: A general datalad save without a path specification will save all of the modified or untracked contents to the dataset.

Therefore, in order to not mix any changes in the dataset that are unrelated to the command plugged into datalad run, by default it will only run on a clean dataset with no changes or untracked files present.

There are two ways to get around this error message: The more obvious – and recommended – one is to save the modifications, and run the command in a clean dataset. We will try this way with the logo_interval.jpg. It would look like this: First, save the changes,

$ datalad save -m "add additional notes on run options"
add(ok): notes.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

and then try again:

$ datalad run -m "Resize logo for slides" \
--input "recordings/longnow/.datalad/feed_metadata/logo_interval.jpg" \
--output "recordings/interval_logo_small.jpg" \
"convert -resize 400x400 recordings/longnow/.datalad/feed_metadata/logo_interval.jpg recordings/interval_logo_small.jpg"
[INFO] Making sure inputs are available (this may take some time) 
[INFO] == Command start (output follows) ===== 
[INFO] == Command exit (modification check follows) ===== 
unlock(ok): recordings/interval_logo_small.jpg (file)
add(ok): recordings/interval_logo_small.jpg (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  get (notneeded: 2)
  save (notneeded: 1, ok: 1)
  unlock (ok: 1)

Note how in this execution of datalad run, output unlocking was actually necessary and DataLad provides a summary of this action in its output.

Add a quick addition to your notes about this way of cleaning up prior to a datalad run:

$ cat << EOT >> notes.txt
Important! If the dataset is not "clean" (a datalad status output is empty),
datalad run will not work - you will have to save modifications present in your
dataset.
EOT

A way of executing a datalad run despite an “unclean” dataset, though, is to add the --explicit flag to datalad run. We will try this flag with the remaining logo_salt.jpg. Note that we have an “unclean dataset” again because of the additional note in notes.txt.

$ datalad run -m "Resize logo for slides" \
--input "recordings/longnow/.datalad/feed_metadata/logo_salt.jpg" \
--output "recordings/salt_logo_small.jpg" \
--explicit \
"convert -resize 400x400 recordings/longnow/.datalad/feed_metadata/logo_salt.jpg recordings/salt_logo_small.jpg"
[INFO] Making sure inputs are available (this may take some time) 
[INFO] == Command start (output follows) ===== 
[INFO] == Command exit (modification check follows) ===== 
unlock(ok): recordings/salt_logo_small.jpg (file)
add(ok): recordings/salt_logo_small.jpg (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  get (notneeded: 2)
  save (ok: 1)
  unlock (ok: 1)

With this flag, DataLad considers the specification of inputs and outputs to be “explicit”. It doesn’t warn if the repository is dirty, but importantly, it only saves modifications to the listed outputs (which is a problem in the vast amount of cases where one does not exactly know which outputs are produced).

A datalad status will show that your previously modified notes.txt is still modified:

$ datalad status
 modified: notes.txt (file)

Add an additional note on the --explicit flag, and finally save your changes to notes.txt.

$ cat << EOT >> notes.txt
A suboptimal alternative is the --explicit flag,
used to record only those changes done
to the files listed with --output flags.

EOT
$ datalad save -m "add note on clean datasets"
add(ok): notes.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

To conclude this section on datalad run, take a look at the last datalad run commit to see a run record with more content:

$ git log -p -n 2
commit cd4e9c17d3a219c33fb9e3ba3ce35de831164a06
Author: Elena Piscopia <elena@example.net>
Date:   Thu Jan 9 07:52:14 2020 +0100

    [DATALAD RUNCMD] Resize logo for slides
    
    === Do not change lines below ===
    {
     "chain": [],
     "cmd": "convert -resize 400x400 recordings/longnow/.datalad/feed_metadata/logo_salt.jpg recordings/salt_logo_small.jpg",
     "dsid": "71f03bec-32ac-11ea-b7a4-e86a64c8054c",
     "exit": 0,
     "extra_inputs": [],
     "inputs": [
      "recordings/longnow/.datalad/feed_metadata/logo_salt.jpg"
     ],
     "outputs": [
      "recordings/salt_logo_small.jpg"
     ],
     "pwd": "."
    }
    ^^^ Do not change lines above ^^^

diff --git a/recordings/salt_logo_small.jpg b/recordings/salt_logo_small.jpg
index b6a0a1d..55ada0f 120000
--- a/recordings/salt_logo_small.jpg
+++ b/recordings/salt_logo_small.jpg