4.3. Retrace and reenact

“Thanks a lot for sharing your dataset with me! This is super helpful. I’m sure I’ll catch up in no time!”, your room mate says confidently. “How far did you get with the DataLad commands yet?” he asks at last.

“Mhh, I think the last big one was datalad run. Actually, let me quickly show you what this command does. There is something that I’ve been wanting to try anyway.” you say.

The dataset you shared contained a number of datalad run commands. For example, you created the simple podcasts.tsv file that listed all titles and speaker names of the longnow podcasts.

Given that you learned to create “proper” datalad run commands, complete with --input and --output specification, anyone should be able to datalad rerun these commits easily. This is what you want to try now.

You begin to think about which datalad run commit would be the most useful one to take a look at. The creation of podcasts.tsv was a bit dull – at this point in time, you didn’t yet know about --input and --output arguments, and the resulting output is present anyway because text files like this .tsv file are stored in Git. However, one of the attempts to resize a picture could be useful. The input, the podcast logos, is not yet retrieved, nor is the resulting, resized image. “Let’s go for this!”, you say, and drag your confused room mate to the computer screen.

First of all, find the commit shasum of the command you want to run by taking a look into the history of the dataset (in the shared dataset):

# navigate into the shared copy
$ cd ../mock_user/DataLad-101
# lets view the history
$ git log --oneline
eb8ce32 add note on clean datasets
9518f3b [DATALAD RUNCMD] Resize logo for slides
3211533 [DATALAD RUNCMD] Resize logo for slides
5adbb4b add additional notes on run options
d5b8506 [DATALAD RUNCMD] convert -resize 450x450 recordings/longn...
5db13d3 resized picture by hand
cc0854a [DATALAD RUNCMD] convert -resize 400x400 recordings/longn...
27a3df9 add note on basic datalad run and datalad rerun
b6566aa add note datalad and git diff
b5919c5 [DATALAD RUNCMD] create a list of podcast titles
fa3e98a BF: list both directories content
ee3c90f [DATALAD RUNCMD] create a list of podcast titles
15398ed Add short script to write a list of podcast speakers and titles
a6bb5da Add note on datalad clone
4b7923f [DATALAD] modified subdataset properties
8dca46c [DATALAD] Recorded changes
b10d56a add note on datalad save
708e6f8 Add notes on datalad create
d841737 add beginners guide on bash
84674e3 add reference book about git
4e85396 add books on Python and Unix to read later
dcc11c1 Instruct annex to add text files to Git
a82a9ce [DATALAD] new dataset

Ah, there it is, the second most recent commit. Just as already done in section DataLad, Re-Run!, take this shasum and plug it into a datalad rerun command:

$ datalad rerun 9518f3b42271056a4adc3ba015f35f0c2ddf03f7
[INFO] run commit 9518f3b; (Resize logo for s...)
[INFO] Making sure inputs are available (this may take some time) 
get(ok): recordings/longnow/.datalad/feed_metadata/logo_salt.jpg (file) [from origin...]
[WARNING] no content present; cannot unlock [unlock(/home/me/dl-101/mock_user/DataLad-101/recordings/salt_logo_small.jpg)] 
remove(ok): recordings/salt_logo_small.jpg
[INFO] == Command start (output follows) ===== 
[INFO] == Command exit (modification check follows) ===== 
add(ok): recordings/salt_logo_small.jpg (file)
action summary:
  add (ok: 1)
  get (notneeded: 1, ok: 1)
  remove (ok: 1)
  save (notneeded: 2)

“This was so easy!” you exclaim. DataLad retrieved the missing file content from the subdataset and it tried to unlock the output prior to the command execution. Note that because you did not retrieve the output, recordings/salt_logo_small.jpg, yet, the missing content could not be unlocked. DataLad warns you about this, but proceeds successfully.

Your room mate now not only knows how exactly the resized file came into existence, but he can also reproduce your exact steps to create it. “This is as reproducible as it can be!” you think in awe.