1.3. Modify content

So far, we’ve only added new content to the dataset. And we have not done much to that content up to this point, to be honest. Let’s see what happens if we add content, and then modify it.

For this, in the root of DataLad-101, create a plain text file called notes.txt. It will contain all of the notes that you take throughout the course.

Let’s write a short summary of how to create a DataLad dataset from scratch:

“One can create a new dataset with ‘datalad create [–description] PATH’. The dataset is created empty”.

This is meant to be a note you would take in an educational course. You can take this note and write it to a file with an editor of your choice. The code below, however, contains this note within the start and end part of a here document. You can also copy the full code snippet, starting from cat << EOT > notes.txt, including the EOT in the last line, in your terminal to write this note from the terminal (without any editor) into notes.txt.

How does a here-document work?

The code snippet below makes sure to write lines of text into a file (that so far does not exist) called notes.txt.

To do this, the content of the “document” is wrapped in between delimiting identifiers. Here, these identifiers are EOT (short for “end of text”), but naming is arbitrary as long as the two identifiers are identical. The first “EOT” identifies the start of the text stream, and the second “EOT” terminates the text stream.

The characters << redirect the text stream into “standard input” (stdin), the standard location that provides the input for a command. Thus, the text stream becomes the input for the cat command, which takes the input and writes it to “standard output” (stdout).

Lastly, the > character takes stdout can creates a new file notes.txt with stdout as its contents.

It might seem like a slightly convoluted way to create a text file with a note in it. But it allows to write notes from the terminal, enabling this book to create commands you can execute with nothing other than your terminal. You are free to copy-paste the snippets with the here-documents, or find a workflow that suites you better. The only thing important is that you create and modify a .txt file over the course of the Basics part of this handbook.

Running the command below will create notes.txt in the root of your DataLad-101 dataset:

Heredocs don’t work under non-Git-Bash Windows terminals

Heredocs rely on Unix-type redirection and multi-line commands – which is not supported on most native Windows terminals or the Anaconda prompt on Windows. If you are using an Anaconda prompt or a Windows terminal other than Git Bash, instead of executing heredocs, please open up an editor and paste and save the text into it.

The relevant text in the snippet below would be:

One can create a new dataset with 'datalad create [--description] PATH'.
The dataset is created empty

If you are using Git Bash, however, here docs will work just fine.

$ cat << EOT > notes.txt
One can create a new dataset with 'datalad create [--description] PATH'.
The dataset is created empty

EOT

Run datalad status to confirm that there is a new, untracked file:

$ datalad status
untracked: notes.txt (file)

Save the current state of this file in your dataset’s history. Because it is the only modification in the dataset, there is no need to specify a path.

$ datalad save -m "Add notes on datalad create"
add(ok): notes.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

But now, let’s see how changing tracked content works. Modify this file by adding another note. After all, you already know how to use datalad save, so write a short summary on that as well.

Again, the example below uses Unix commands (cat and redirection, this time however with >> to append new content to the existing file) to accomplish this, but you can take any editor of your choice.

$ cat << EOT >> notes.txt
The command "datalad save [-m] PATH" saves the file (modifications) to
history.
Note to self: Always use informative, concise commit messages.

EOT

Let’s check the dataset’s current state:

$ datalad status
 modified: notes.txt (file)

and save the file in DataLad:

$ datalad save -m "add note on datalad save"
add(ok): notes.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

Let’s take another look into our history to see the development of this file. We’re using git log -p -n 2 to see last two commits and explore the difference to the previous state of a file within each commit.

$ git log -p -n 2
commit c4170cea37b9ad7b35f19167b2d20f09d33cd3db
Author: Elena Piscopia <elena@example.net>
Date:   Wed Dec 14 16:57:13 2022 +0100

    add note on datalad save

diff --git a/notes.txt b/notes.txt
index 3a7a1fe..0142412 100644
--- a/notes.txt
+++ b/notes.txt
@@ -1,3 +1,7 @@
 One can create a new dataset with 'datalad create [--description] PATH'.
 The dataset is created empty
 
+The command "datalad save [-m] PATH" saves the file (modifications) to
+history.
+Note to self: Always use informative, concise commit messages.
+

commit 4d5c65a1870650721a8083867860235e22ba283c
Author: Elena Piscopia <elena@example.net>
Date:   Wed Dec 14 16:57:13 2022 +0100

    Add notes on datalad create

diff --git a/notes.txt b/notes.txt
new file mode 100644

We can see that the history can not only show us the commit message attached to a commit, but also the precise change that occurred in the text file in the commit. Additions are marked with a +, and deletions would be shown with a leading -. From the dataset’s history, we can therefore also find out how the text file evolved over time. That’s quite neat, isn’t it?

git log has many more useful options

git log, as many other Git commands, has a good number of options which you can discover if you run git log --help. Those options could help to find specific changes (e.g., which added or removed a specific word with -S), or change how git log output will look (e.g., --word-diff to highlight individual word changes in the -p output).