Why Git is described as a "content tracking system" -- Part 1

Printer-friendly versionPrinter-friendly version

As the next in what might turn into a series on the Git version control system, this post explains what it means to describe Git as a "content tracking" system, a concept that is sometimes a bit difficult for newcomers to wrap their heads around, so let's explain it by way of comparison and a couple trivial Git examples.

Most people who have used a version control system (VCS) are used to the idea that their VCS will initially store the entire content of a new file that has just been added to that VCS's repository, after which what will be stored in the VCS will be the differences or deltas for that file as changes are made to the file and committed to the VCS.

Put another way, users are used to the idea that while a file's initial commit could be quite sizable, if one makes and commits only small changes to the file from then on, what gets stored in the VCS from commit to commit is only the differences between successive versions of that file -- very space-efficient, as users like to think.

But this model of VCS operation has an obvious consequence. It means that while the VCS stores the (space-efficient) deltas between the file versions, getting the current version of any file requires a calculation that involves taking the initial version of the file and applying to it all of the stored deltas.

As a concrete example, if someone added a massive 100M file to such a VCS, then made and committed 100 miniscule changes to that file, what would end up in the VCS would be some representation of that original 100M file, plus 100 very small deltas corresponding to the successive changes made to that file over time. This all makes sense, but it has an obvious consequence.

A VCS that behaves this way can be said to store deltas, but if one wants to fetch the current version of a file, that would require calculating that version using the file's initial version, then applying each delta to it in the correct order to eventually generate the current version of the file.

In other words, a VCS that behaved this way would obviously be very space-efficient, but the tradeoff would be that simply checking out the latest version of a file would be computationally intensive, but that's just the price you pay for a VCS with this model of operation.

Git, however, takes the opposite approach. Rather than storing deltas, Git really does store the entire file contents for every version of that file. Using our earlier example, if we committed an initial file of size 100M, then made a tiny change and committed that, our Git repository would now contain two 100M file representations, which would be almost (but not quite) identical.

Most users learning Git for the first time are shocked to learn this, immediately complaining about how wasteful this has to be in terms of disk space. But this technique has its advantages, as you'll see in a couple hands-on Git examples coming up in the imminent Part 2. So, as they say on "Sports Night," stick around ...

We know

We're aware of the time and budget pressures at most companies, normally accompanied by the plaintive cry from management of, "Yes, I know we need training on that topic, but I just can't afford to send my entire team away for three (or four or five) days to get it!" And that's where we come in.

Choices!

The main focus at Crashcourse is to offer a choice of intense, 1-day, hands-on courses on specific topics in Linux and open source. And given that we already have the laptops for the delivery of that training, the idea is to show up early, set up a classroom, then spend the day teaching exactly the topic you're interested in. No travel time, and no wasted classroom time.

Customization

If we don't already have a course that addresses the topic you're interested in, drop us a note and we'll see what we can do -- our content providers can almost certainly put together a course that's precisely what you're after.

The difference

While there are a variety of sources for Linux and open source training, we at Crashcourse are taking a slightly different approach. Our philosophy is simple: exactly the training you want, and no wasted time or travel to get it.