3.1. Data safety¶
Later in the day, after seeing and solving so many DataLad error messages, you fall tired into your bed. Just as you are about to fall asleep, a thought crosses your mind:
“I now know that tracked content in a dataset is protected by git-annex.
Whenever tracked contents are
saved, they get locked and should not be
modifiable. But… what about the notes that I have been taking since the first day?
Should I not need to unlock them before I can modify them? And also the script!
I was able to modify this despite giving it to DataLad to track, with
no permission denied errors whatsoever! How does that work?”
This night, though, your question stays unanswered and you fall into a restless sleep filled with bad dreams about “permission denied” errors. The next day you are the first student in your lecturer’s office hours.
“Oh, you are really attentive. This is a great question!” our lecturer starts to explain.
Do you remember that we created the
DataLad-101 dataset with a
specific configuration template? It was the
-c text2git option we
provided in the beginning of Create a dataset. It is because of this configuration
that we can modify
notes.txt without unlocking its content first.
The second commit message in our datasets history summarizes this (outputs are shortened):
$ git log --reverse --oneline 4ce681d [DATALAD] new dataset e0ff3a7 Instruct annex to add text files to Git b40316a add books on Python and Unix to read later a875e49 add reference book about git 59ac8d3 add beginners guide on bash 874d766 Add notes on datalad create e310b46 add note on datalad save 3c016f7 [DATALAD] Added subdataset 87609a3 Add note on datalad clone
Instead of giving text files such as your notes or your script to git-annex, the dataset stores it in Git. But what does it mean if files are in Git instead of git-annex?
Well, procedurally it means that everything that is stored in git-annex is content-locked, and everything that is stored in Git is not. You can modify content stored in Git straight away, without unlocking it first.
That’s easy enough, and illustrated in Fig. 3.1.
“So, first of all: If we hadn’t provided the
-c text2git argument, text files
would get content-locked, too?”. “Yes, indeed. However, there are also ways to
later change how file content is handled based on its type or size. It can be specified
.gitattributes file, using
But there will be a lecture on that.”
“Okay, well, second: Isn’t it much easier to just not bother with locking and
unlocking, and have everything ‘stored in Git’? Even if
datalad run (manual) takes care
of unlocking content, I do not see the point of git-annex”, you continue.
Here it gets tricky. To begin with the most important, and most straight-forward fact: It is not possible to store large files in Git. This is because Git would very quickly run into severe performance issues. And hosting sites for projects using Git, such as GitHub or GitLab also do not allow files larger than a few dozen MB of size.
For now, we have solved the mystery of why text files can be modified
without unlocking, and this is a small
improvement in the vast amount of questions that have piled up in our curious
minds. Essentially, git-annex protects your data from accidental modifications
and thus keeps it safe.
datalad run commands mitigate any technical
complexity of this completely if
-o/--output is specified properly, and
datalad unlock (manual) commands can be used to unlock content “by hand” if
modifications are performed outside of a
But there comes the second, tricky part: There are ways to get rid of locking and unlocking within git-annex, using so-called adjusted branches. This functionality is dependent on the git-annex version one has installed, the git-annex version of the repository, and a use-case dependent comparison of the pros and cons. On Windows systems, this adjusted mode is even the only mode of operation. In later sections we will see how to use this feature. The next lecture, in any way, will guide us deeper into git-annex, and improve our understanding a slight bit further.