4.3. Retrace and reenact¶
“Thanks a lot for sharing your dataset with me! This is super helpful. I’m sure I’ll catch up in no time!”, your room mate says confidently. “How far did you get with the DataLad commands yet?” he asks at last.
“Mhh, I think the last big one was datalad run
(manual).
Actually, let me quickly show you what this command
does. There is something that I’ve been wanting to try
anyway.” you say.
The dataset you shared contained a number of datalad run
commands. For example, you created the simple podcasts.tsv
file that listed all titles and speaker names of the longnow
podcasts.
Given that you learned to create “proper” datalad run
commands,
complete with --input
and --output
specification,
anyone should be able to datalad rerun
(manual) these commits
easily. This is what you want to try now.
You begin to think about which datalad run
commit would be
the most useful one to take a look at. The creation of
podcasts.tsv
was a bit dull – at this point in time, you
didn’t yet know about --input
and --output
arguments,
and the resulting output is present anyway because text files
like this .tsv
file are stored in Git.
However, one of the attempts to resize a picture could be
useful. The input, the podcast logos, is not yet retrieved,
nor is the resulting, resized image. “Let’s go for this!”,
you say, and drag your confused room mate to the computer
screen.
First of all, find the commit shasum of the command you want to run by taking a look into the history of the dataset (in the shared dataset):
# navigate into the shared copy
$ cd ../mock_user/DataLad-101
# lets view the history
$ git log --oneline -n 10
b5fc129 add note on clean datasets
d5845d3 [DATALAD RUNCMD] Resize logo for slides
d24531d [DATALAD RUNCMD] Resize logo for slides
af9ca10 add additional notes on run options
a4044db [DATALAD RUNCMD] convert -resize 450x450 recordings/longn...
9369462 resized picture by hand
be503c3 [DATALAD RUNCMD] convert -resize 400x400 recordings/longn...
4009576 add note on basic datalad run and datalad rerun
572fd65 add note datalad and git diff
f4916cc [DATALAD RUNCMD] create a list of podcast titles
Ah, there it is, the second most recent commit.
Just as already done in section DataLad, Re-Run!,
take this shasum and plug it into a datalad rerun
command:
$ datalad rerun d5845d33✂SHA1
[INFO] run commit d5845d3; (Resize logo for s...)
[INFO] Making sure inputs are available (this may take some time)
get(ok): recordings/longnow/.datalad/feed_metadata/logo_salt.jpg (file) [from web...]
run.remove(ok): recordings/salt_logo_small.jpg (file) [Removed file]
[INFO] == Command start (output follows) =====
[INFO] == Command exit (modification check follows) =====
run(ok): /home/me/dl-101/mock_user/DataLad-101 (dataset) [convert -resize 400x400 recordings/longn...]
add(ok): recordings/salt_logo_small.jpg (file)
action summary:
add (ok: 1)
get (notneeded: 1, ok: 1)
run (ok: 1)
run.remove (ok: 1)
save (notneeded: 2)
“This was so easy!” you exclaim. DataLad retrieved the missing
file content from the subdataset and it tried to unlock the output
prior to the command execution. Note that because you did not retrieve
the output, recordings/salt_logo_small.jpg
, yet, the missing content
could not be unlocked. DataLad warns you about this, but proceeds
successfully.
Your room mate now not only knows how exactly the resized file came into existence, but he can also reproduce your exact steps to create it. “This is as reproducible as it can be!” you think in awe.