Next: , Previous: Version Control Systems, Up: Introduction to VC


32.1.1.3 Concepts of Version Control

When a file is under version control, we say that it is registered in the version control system. The system has a repository which stores both the file's present state and its change history—enough to reconstruct the current version or any earlier version. The repository also contains other information, such as log entries that describe the changes made to each file.

A file checked out of a repository is called the work file. You edit the work file and make changes in it, as you would with an ordinary file. After you are done with a set of changes, you check in or commit the file; this records the changes in the repository, along with a log entry for those changes.

A copy of a file stored in a repository is called a revision. The history of a file is a sequence of revisions. Each revisions is named by a revision ID. The format of the revision ID depends on the version control system; in the simplest case, it is just an integer.

To go beyond these basic concepts, you will need to understand three aspects in which version control systems differ. They can be locking-based or merging-based; they can be file-based or changeset-based; and they can be centralized or decentralized. VC handles all these modes of operation, but it cannot hide the differences.

A version control system typically has some mechanism to coordinate between users who want to change the same file. There are two ways to do this: merging and locking.

In a version control system that uses merging, each user may check out and modify a work file at any time. The system lets you merge your work file, which may contain changes that have not been checked in, with the latest changes that others have checked into the repository.

Older version control systems use a locking scheme instead. Here, work files are normally read-only. To edit a file, you ask the version control system to make it writable for you by locking it; only one user can lock a given file at any given time. This procedure is analogous to, but different from, the locking that Emacs uses to detect simultaneous editing of ordinary files (see Interlocking). When you check in your changes, that unlocks the file, and the work file becomes read-only again. Other users may then lock the file to make their own changes.

Both locking and merging systems can have problems when multiple users try to modify the same file at the same time. Locking systems have lock conflicts; a user may try to check a file out and be unable to because it is locked. In merging systems, merge conflicts happen when you check in a change to a file that conflicts with a change checked in by someone else after your checkout. Both kinds of conflict have to be resolved by human judgment and communication. Experience has shown that merging is superior to locking, both in convenience to developers and in minimizing the number and severity of conflicts that actually occur.

SCCS always uses locking. RCS is lock-based by default but can be told to operate in a merging style. CVS and Subversion are merge-based by default but can be told to operate in a locking mode. Distributed version control systems, such as GNU Arch, git, and Mercurial, are exclusively merging-based.

VC mode supports both locking and merging version control. The terms “checkin” and “checkout” come from locking-based version control systems; newer version control systems have slightly different operations usually called “commit” and “update”, but VC hides the differences between them as much as possible.

On SCCS, RCS, CVS, and other early version control systems, version control operations are file-based: each file has its own comment and revision history separate from that of all other files in the system. Later systems, beginning with Subversion, are changeset-based: a checkin may include changes to several files, and the entire set of changes is treated as a unit by the system. Any comment associated with the change does not belong to a single file, but to the changeset itself.

Changeset-based version control is more flexible and powerful than file-based version control; usually, when a change to multiple files has to be reversed, it's good to be able to easily identify and remove all of it.

Early version control systems were designed around a centralized model in which each project has only one repository used by all developers. SCCS, RCS, CVS, and Subversion share this kind of model. One of its drawbacks is that the repository is a choke point for reliability and efficiency.

GNU Arch pioneered the concept of decentralized version control, later implemented in git, Mercurial, and Bazaar. A project may have several different repositories, and these systems support a sort of super-merge between repositories that tries to reconcile their change histories. At the limit, each developer has his/her own repository, and repository merges replace checkin/commit operations.

VC's job is to help you manage the traffic between your personal workfiles and a repository. Whether that repository is a single master or one of a network of peer repositories is not something VC has to care about. Thus, the difference between a centralized and a decentralized version control system is invisible to VC mode.