4.4. Stay up to date¶
All of what you have seen about sharing dataset was really
cool, and for the most part also surprisingly intuitive.
datalad run (manual) commands or file retrieval worked exactly as
you imagined it to work, and you begin to think that slowly but
steadily you are getting a feel about how DataLad really works.
But to be honest, so far, sharing the dataset with DataLad was
also remarkably unexciting given that you already knew most of
the dataset magic that your room mate currently is still
To be honest, you are not yet certain whether
sharing data with DataLad really improves your life up
until this point. After all, you could have just copied
your directory into your
mock_user directory and
this would have resulted in about the same output, right?
What we will be looking into now is how shared DataLad datasets can be updated.
This is a change that is not reflected in your “shared”
$ # Inside the installed copy, view the last 15 lines of notes.txt $ tail notes.txt should be specified with an -o/--output flag. Upon a run or rerun of the command, the contents of these files will get unlocked so that they can be modified. Important! If the dataset is not "clean" (a datalad status output is empty), datalad run will not work - you will have to save modifications present in your dataset. A suboptimal alternative is the --explicit flag, used to record only those changes done to the files listed with --output flags.
But the original intention of sharing the dataset with your room mate was to give him access to your notes. How does he get the notes that you have added in the last two sections, for example?
This installed copy of
DataLad-101 knows its
the place it was installed from. Using this information,
it can query the original dataset whether any changes
happened since the last time it checked, and if so, retrieve and
This is done with the
datalad update --how merge (manual)
$ datalad update --how merge merge(ok): . (dataset) [Merged origin/main] update.annex_merge(ok): . (dataset) [Merged annex branch] update(ok): . (dataset)
Importantly, run this command either within the specific
(sub)dataset you are interested in, or provide a path to
the root of the dataset you are interested in with the
--dataset flag. If you would run the command
longnow subdataset, you would query this
origin for updates, not the original
Let’s check the contents in
notes.txt to see whether
the previously missing changes are now present:
$ # view the last 15 lines of notes.txt $ tail notes.txt Note that a recursive "datalad get" would install all further registered subdatasets underneath a subdataset, so a safer way to proceed is to set a decent --recursion-limit: "datalad get -n -r --recursion-limit 2 <subds>" The command "git annex whereis PATH" lists the repositories that have the file content of an annexed file. When using "datalad get" to retrieve file content, those repositories will be queried.
Wohoo, the contents are here!
Therefore, sharing DataLad datasets by installing them enables you to update the datasets content should the original datasets’ content change – in only a single command. How cool is that?!
Conclude this section by adding a note about updating a
dataset to your own
$ # navigate back: $ cd ../../DataLad-101 $ # write the note $ cat << EOT >> notes.txt To update a shared dataset, run the command "datalad update --how merge". This command will query its origin for changes, and integrate the changes into the dataset. EOT
$ # save the changes $ datalad save -m "add note about datalad update" add(ok): notes.txt (file) save(ok): . (dataset)
PS: You might wonder what a plain
datalad update command with no option does.
If you are a Git-user and know about branches and merging you can read the
Note for Git-users. However, a thorough explanation
and demonstration will be in the next section.
datalad update is the DataLad equivalent of a
git fetch (manual),
datalad update --how merge is the DataLad equivalent of a
git pull (manual).
Upon a simple
datalad update, the remote information
is available on a branch separate from the main branch
– in most cases this will be
git checkout (manual) this branch or run
git diff (manual) to
explore the changes and identify potential merge conflicts.