Copyright 2002,2003,2004 Alexander Taler

LibCVS internal architecture


This document describes the internals of an implemention of LibCVS in broad terms. It is not required that a LibCVS implementation follow this architecture document, although it may save some effort or time. A level of familiarity with the the LibCVS API description is assumed in this document.

This document is currently based on the architecture of the Perl LibCVS.

Static Diagram

This diagram describes the major classes. More minor and support classes are described later.

The API classes are those described in the API document, and available to users of the library, including WorkingFile, FileBranch, etc. (Repository is also in the API document, but it has been split out for clarity.) The using relationship flows from them through the important Command class, and to the Client and Connection, which communicate with the server.

API Classes

Those described in the API documentation, excluding Repository. Each of them keeps a handle to the Repository in which it lives. This package includes at least these classes:


The Repository is a holding place for the IgnoreChecker and the Client. It is accessible to the user of the API, but they have limited access and interest in it. It is an appropriate place to add caches for stuff stored in the server.


The IgnoreChecker keeps a list of file patterns to ignore, and is queried by the APIClasses. It uses Command to access the global CVSROOT/cvsignore file. It also respects the ~/.cvsignore file, and any .cvsignore files in working directories.


Command provides an encapsulation of the cvsclient protocol, taking care of the messy work of constructing and sending the cvsclient requests needed perform a requested command. It is a very important class in terms of easing implementation of the API classes.

Each Command object is essentially single use. It is constructed with parameters describing the CVS command which should be run, it is sent to the server, and then results are read back. It is probably desirable to implement this class in different ways for different languages.

The methods of Command are:

Command new(string name, string[] options, FileOrDir[] args)
  • name is the name of the cvs command, such as "update" or "commit".
  • options are options to pass with the cvs command such as "-l", "-r1.1".
  • args are FileOrDirectory and WorkingFileOrDirectory objects, to run the command on. They must all be in the same repository.
issue() Send the command to the server. The provided information is converted into cvsclient requests, and sent to the the repository to which the objects belong. Results are stored in the command object.
Response[] get_responses() Return the responses to the command as an array of Response objects.
string[] get_messages() Return the messages responses to the command as an array of strings.
Response[] get_files() Returns file transmissions responses to the command.

In a multi-threaded environment, Command should synchronize on the Client, because mixing up the requests of multiple Commands is incorrect.


Admin provides access to CVS administrative files in a working directory, those in the "CVS" directory. It is needed by the Working* API classes.

Admin new(string dir) Construct a new Admin object for the named working directory.
string dirName() Name of the directory this is an admin object for.
DatumEntry[] entries() Files and directories listed in the Entries file.
DatumRoot root() Contents of the Root file.
DatumDirectoryName repository() Contents of the Repository file.
DatumTagSpec tag() Contents of the Tag file.


The client part of the cvsclient protocol. It is used for issuing requests and receiving responses. See the CVSClient docs for more details.


The connection used by the Client to communicate with the server. See the CVSClient docs for more details.


The cvs client protocol implementation transmits many small chunks of information, and inspired the Datum (singular of Data) class in order to make this process easier. This class and its subclasses found their way into other parts of the internal architecture. They have been excluded from the exposed API, to reduce its confusion.

Datum subclasses may have publically accessible members, just because it's easier than writing methods to access them. Following is a list and description of the various Datum classes.


The abstract super class of all the other Datum classes.

abstract Datum new(source) Construct a new Datum from the given source, which can be a network stream, a file, a string, or a Datum object.
string asString()
A string representation of this datum, suitable to be shown to the user.
void protocolPrint(sink)
Print the datum to the provided sink, probably a network connection to a server.
string asProtocolString()
A string representation of this datum, suitable to be sent on a cvsclient connection.
boolean equals(Datum other)
Check if two Data are equal.


A directory name as a datum. Useful for sending directory names to the server.



This represents an RCS style entries line, from the CVS/Entries file or the protocol. It is slightly changed from what is reported in the CVS docs, TYPE has been added, and is "D" for a directory, empty for a file. "CONFLICT" has been renamed "TIME".


string name()
The name of the file this entry refers to.
DatumTagSpec tag()
Return the tag spec, if there is one.
DatumRevisionNumber revision()
Revision number of this entry.
boolean isFile()
True if this entry is for a file.
boolean isDirectory()
True if this entry is for a directory.
time updatedTime()
The time this file was last made up-to-date by CVS. For files that are the result of a merge, no time is avaiable, so 0 is returned.
time conflictTime()
The time conflict information was inserted, or 0 if conflicts were not inserted. (Need special handling for conflicts which have just been reported by the server, because it returns an Entries line with no time over the cvsclient protocol.)
boolean isConflict()
Return true if this entry is for a file with a conflict.


A datum containing the contents of a file.



A datum containing a UNIX-style filemode.



A file name as a datum. Useful for sending file names to the server.



A log message as a datum. Not just the message, but the other information included for each revision in cvs log output.

RevisionNumber revision()
The revision this is a log message for.
string text() 
The text of the log message.
string committer()
time date()
boolean isDead()
??? lines()
??? branches()


A datum containing a revision number. It can represent a regular revision number ("1.1"), a branch number ("1.1.2") or magic branch number(""). In general this class should be primarily internal. Users of the library may need to print out values and create new ones from user supplied information, but branch traversal should usually use the methods of FileRevision and FileBranch.

Some revision number methods are branch or revision specific, these methods have been split off into subclasses. Optionally, all methods could be rolled into a single class, and modified to raise errors if they are called inappropriately. If this is done, an additional method boolean isBranch() would have to be added.

To describe the relationship among both revision and branch revision numbers, the terms ancestor and descendant are used. The ancestral relationship is transitive, so if a is an ancestor of b, and b is an ancestor of c, then it follows that a is an ancestor of c. A revision revision number is an ancestor of all revision numbers that follow it on its branch. A branch revision number is a descendant of its base revision, and an ancestor of all other revisions on that branch. Revisions which don't have an ancestral relationship are called incomparable. Descendant is the opposite of ancestor, so if a is an ancestor of b, then b is a descendant of a.

Thus "1.6" is an ancestor of "1.9", "1.6.2", "" and ""; "1.6.2" is an ancestor of "" and a descendant of "1.6"; and "" and "1.19" are incomparable.

Because of CVS's lazy branching scheme, branches that sprout from the same revision could be ancestors. The terms possible ancestor and possible descendant are used to describe this relationship. For example, "1.6.2" is a possible ancestor of "1.6.4".

It has the following data member:


and some methods:

RevisionNumber new(string number)
Construct a new revision number from a string such as "", or raise an error if it's malformed. It makes sense to construct a revision number, from user supplied input for example.
enum compare(DatumNumber other)
Compare this revision number to another to determine their ancestral relationship. Possible return values are:
  • EQUAL: They are the same revision.
  • ANCESTOR: other is an ancestor of this revision.
  • DESCENDANT: other is a descendant of this revision.
  • POSSIBLE_ANCESTOR: other is possibly an ancestor of this revision.
  • POSSIBLE_DESCENDANT: other is possibly a descendant of this revision.
  • INCOMPARABLE: none of the above is true.


A branch revision number, with the following additional methods:

boolean isTrunk()
Return true if this is a trunk revision number (eg "1" or "2").
boolean isImportBranch()
Return true if this is an import branch. It is assumed this is equivalent to having an odd branch revision number.
DatumRevisionNumber base()
Returns the revision number for the base of this branch revision.
DatumRevisionNumber firstRevision()
Returns the first revision number committed to this branch. This is not the same as the base. It's the branch number with a ".1" appended.


A revision revision number, with the following additional methods:

DatumBranchNumber branch()
Returns the revision number of the branch of this revision.
DatumNumber predecessor()
Return the revision number that immediately precedes this one, it's youngest ancestor.
DatumNumber successor()
Return the revision number that immediately follows this one, it's eldest descendant. At a branch point a revision may have several immediate descendants. This method will return a revision on the same branch, essentially incrementing the revision number by one. eg successor of "1.2" is "1.3", not "". As a consequence a.equals(a.predecessor().successor()) is not necessarily true, although a.equals(a.successor().predecessor()) is true.


A pair of pathnames, as used in the cvsclient protocol.



A datum containing a CVSROOT specification, for contacting a repository.



String to be sent to the server.



A CVS TagSpec. It's a concatentation of a tag type and a value e.g. "Trelease_1_01" or "D2002.". LibCVS uses the following four tag types:

This usage deviates from that in CVS, which is: (This should be validated for accuracy.)

enum type()
The type of tag spec.
string value()
The value of the tag spec.


A time as a string in CVS format.


Other Concerns


Caching various bits of data can improve performance immensely. Things to cache are listed here.


The cvsclient protocol is specified in such a way that connections can be used for multiple commands. Thus a Connection can be kept open inside a Client, which is stored in a Repository. To make this scheme work, only one Repository should exist for each repository/server that communication will be done with, and Command should synchronize on Client in order to prevent multiple threads interfering in the protocol.

One thing to look out for are servers which cannot support multiple commands over a single connection due to broken loginfo scripts. This can be tested for when the connection is opened, and connections are only preserved if multiple commands can be issued.