Retrace and reenact¶
“Thanks a lot for sharing your dataset with me! This is super helpful. I’m sure I’ll catch up in no time!”, your room mate says confidently. “How far did you get with the DataLad commands yet?” he asks at last.
“Mhh, I think the last big one was datalad run. Actually, let me quickly show you what this command does. There is something that I’ve been wanting to try anyway.” you say.
The dataset you shared contained a number of datalad run
commands. For example, you created the simple
file that listed all titles and speaker names of the longnow
Given that you learned to create “proper” datalad run commands,
anyone should be able to datalad rerun these commits
easily. This is what you want to try now.
You begin to think about which datalad run commit would be
the most useful one to take a look at. The creation of
podcasts.tsv was a bit dull – at this point in time, you
didn’t yet know about
and the resulting output is present anyway because text files
.tsv file are stored in Git.
However, one of the attempts to resize a picture could be
useful. The input, the podcast logos, is not yet retrieved,
nor is the resulting, resized image. “Let’s go for this!”,
you say, and drag your confused room mate to the computer
First of all, find the commit shasum of the command you want to run by taking a look into the history of the dataset (in the shared dataset):
# navigate into the shared copy $ cd ../mock_user/DataLad-101
# lets view the history $ git log --oneline b64a92b add note on clean datasets cd4e9c1 [DATALAD RUNCMD] Resize logo for slides cfd6f24 [DATALAD RUNCMD] Resize logo for slides 3f06057 add additional notes on run options 97f36f1 [DATALAD RUNCMD] convert -resize 450x450 recordings/longn... b71548b resized picture by hand 4291a9f [DATALAD RUNCMD] convert -resize 400x400 recordings/longn... a2055bf add note on basic datalad run and datalad rerun cec87ae add note datalad and git diff 3d0c225 [DATALAD RUNCMD] create a list of podcast titles f2c608e BF: list both directories content 87fde6a [DATALAD RUNCMD] create a list of podcast titles 5b391e9 Add short script to write a list of podcast speakers and titles 445c53e Add note on datalad clone 2fcef51 [DATALAD] Recorded changes 0023a97 add note on datalad save 3baae1e Add notes on datalad create da7d2d0 add beginners guide on bash e1e8af3 add reference book about git 69e7983 add books on Python and Unix to read later ca376f4 Instruct annex to add text files to Git d842213 [DATALAD] new dataset
Ah, there it is, the second most recent commit. Just as already done in section DataLad, Re-Run!, take this shasum and plug it into a datalad rerun command:
$ datalad rerun cd4e9c17d3a219c33fb9e3ba3ce35de831164a06 [INFO] Making sure inputs are available (this may take some time) [WARNING] no content present; cannot unlock [unlock(/home/me/dl-101/mock_user/DataLad-101/recordings/salt_logo_small.jpg)] [INFO] == Command start (output follows) ===== [INFO] == Command exit (modification check follows) ===== get(ok): recordings/longnow/.datalad/feed_metadata/logo_salt.jpg (file) [from origin...] remove(ok): recordings/salt_logo_small.jpg add(ok): recordings/salt_logo_small.jpg (file) action summary: add (ok: 1) get (notneeded: 1, ok: 1) remove (ok: 1) save (notneeded: 2)
“This was so easy!” you exclaim. DataLad retrieved the missing
file content from the subdataset and it tried to unlock the output
prior to the command execution. Note that because you did not retrieve
recordings/salt_logo_small.jpg, yet, the missing content
could not be unlocked. DataLad warns you about this, but proceeds
Your room mate now not only knows how exactly the resized file came into existence, but he can also reproduce your exact steps to create it. “This is as reproducible as it can be!” you think in awe.