The Hackerlab at regexps.com

Project Tree Inventories

up: arch Meets hello-world
next: Inventory Ids for Source
prev: Starting a New Source Tree

Caution: Steep Learning Curve: The concepts and commands introduced in this chapter are likely to be unfamiliar to you, even if you have used other revision control systems. They're really quite simple once you get over the initial learning hurdle -- and after that they're very useful.


The Name-based inventory Concept

In a project tree, some of the files and directories are "part of the source" -- they are of interest to arch . Other files and directories may be scratch files, editor back-up files, and temporary or intermediate files generated by programs. Those other files should be ignored or treated specially by most arch commands.

This chapter discusses how arch recognizes which files to pay attention to, and which to ignore.


The inventory Command

up: Project Tree Inventories
next: The arch Naming Conventions
prev: The Name-based inventory Concept

The command tla inventory --names --source is used to print a list of source files as determined by the naming conventions. It has many options, including options to print other kinds of file lists (such as a list of all editor backup files, or a list of all files which are not source).

Let's suppose that after some editing, our source tree looks like this:

        % ls
        hw.c            hw.c.~1~        main.c          {arch}

The file hw.c.~1~ is an editor backup file. tla knows that and omits that file from the source inventory:

        % tla inventory --names --source
        ./hw.c
        ./main.c

tla can give you other lists besides lists of source:

        % tla inventory --names --backups
        ./hw.c.~1~


The arch Naming Conventions

up: Project Tree Inventories
next: Naming Conventions Illustrated
prev: The inventory Command

This section describes the default naming conventions used by arch to pick out source files from other kinds of files. A later chapter describes how to customize these conventions for a partiuclar tree (see Customizing the inventory Naming Conventions).

The naming conventions are based on several categories of files:


        . and ..                These are simply ignored by arch

        excluded                Excluded files are normally omitted
                                from a listing, but if the `--all'
                                flag is passed to `inventory', 
                                then these files are put into 
                                one of the categories below and
                                included in the listing.

        source                  These are apparent source files

        precious                These are non-source files that 
                                should not be automatically deleted

        junk                    These are non-source files that
                                may be automatically deleted

        backups                 These are non-source files that
                                may be automatically deleted, but
                                any program that deletes them should
                                treat them as editor backup files
                                (e.g., keep the oldest and newest
                                of them)

        unrecognized            These are files that arch doesn't
                                know how to classify -- they fit
                                none of the naming conventions or
                                that have names that appear to
                                be "suspicious".


The algorithm for classifying files by name has several rules. For each file name, each of these rules is checked in the order listed here until the first rule is reached that classifies the file.

Exclude Dot Files The special files . and .. are always excluded from inventory listings.

Non-portable Names are Unrecognized File names containing whitespace, non-printing characters, or a "globbing character" are always classified as unrecognized . The globbing characters are:

        ? [ ] * \

Excluded File Test If the --all flag is not given to inventory , the file names matching the pattern for excluded files are dropped from the listing. If the name of a directory is excluded, the entire contents of that directory are skipped. By default, the pattern for excluded files matches control files created by arch itself:

        ^(.arch-ids|\{arch\})$

Junk File Test All file names reaching this step that begin with two commas (,, ) are classified as junk . Temporary files created by arch itself begin with two commas. In addition, any file name matching the junk pattern are classified by junk . By default, that pattern matches any name beginning with (at least) one comma:

        ^,.*$

Incidentally, that default pattern gives rise to a handy trick. If you need to create a scratch file in a source tree, give it a name that begins with a single comma.

Backup File Test By default, a backup file is any file that reaches this step and matches one of the patterns:

        ^.*(~|\.~[0-9]+~)$
        ^.*\.bak|\.orig|\.rej|\.original|\.modified|\.reject)$

Precious File Test By default, a precious file is any that reaches this step and matches one of the patterns:

     ^\+.*$
     ^(\.gdbinit|\.#ckpts-lock)$
     ^(=build\.*|=install\.*)$
     ^(CVS|CVS\.adm|RCS|RCSLOG|SCCS|TAGS)$

Suspicious File Test (Unrecognized) Some file names reaching this step are explicitly treated as unrecognized on the presumption that they should probably not be present in a source tree. By default, names ending with any of these extensions are treated as unrecognized :

        .o
        .a
        .so
        .core

In addition, the filename core is (by default) treated as unrecognized ).

Source File Test Files reaching this step are compared to the pattern for source files. The default pattern is shown below. You should note that this pattern overlaps that for excluded files given above. If the --all flag is given to inventory, the excluded pattern isn't used, and files that would match it instead "fall through" to later steps of this algorithm.

        ^([_=a-zA-Z0-9].*|\.arch-ids|\{arch\}|\.arch-project-tree)$

In other words, by default, the arch control files and directories are source (if not excluded). Files beginning with letters, numbers, underscore, or an equal sign are source.

Unrecognized Files Any left-over file name reaching this step is treated as unrecognized .


Naming Conventions Illustrated

up: Project Tree Inventories
next: Customizing the Naming Conventions
prev: The arch Naming Conventions

Using our example, we can illustrate some of the naming conventions.

Recall that our project tree looks like this:

        % ls
        hw.c            hw.c.~1~        main.c          {arch}

So the ordinary source listing is:

        % tla inventory --names --source
        ./hw.c
        ./main.c

And all of the source files (none excluded from the list) is:

        % tla inventory --names --source --all
        ./hw.c
        ./main.c
        ./{arch}/.arch-project-tree
        ./{arch}/=tagging-method

We can include directories in this listing:

        % tla inventory --names --source --all --both
        ./hw.c
        ./main.c
        ./{arch}
        ./{arch}/.arch-project-tree
        ./{arch}/=tagging-method
        ./{arch}/hello-world
        ./{arch}/hello-world/hello-world--mainline
        [... output trimmed ...]

We can also look at some lists of non-source files:

        % tla inventory --names --backups
        ./hw.c.~1~

The inventory command has many options that you may wish to explore.


Customizing the Naming Conventions

up: Project Tree Inventories
next: Why is It Like This -- inventory Naming Conventions
prev: Naming Conventions Illustrated

You can alter the patterns used by inventory to classify files. This is explained in a later chapter (see Customizing the inventory Naming Conventions).


Why is It Like This -- inventory Naming Conventions

up: Project Tree Inventories
prev: Customizing the Naming Conventions

Many systems provide naming conventions for recognizing source files but users new to arch often wonder why arch needs so many categories of files. Recall that arch has the categories:

        excluded
        source
        precious
        junk
        backups
        unrecognized

A rationale for each category is explained here:

excluded is provided simply to keep inventory listings brief in the very common case that arch control files are of no particular interest. This is similar to the treatment of "dot files" by ls and the --all flag to inventory is similar to the -a flag to ls .

source is provides simply so that arch can reliably distinguish those files from others. For example, when comparing two source trees, arch compares only the files in the category source .

precious files are those that arch should make an effort to preserve. For example, if arch needs to make a copy of a project tree for you, it copies the precious files along with the source . Suppose, for example, that you are taking notes while working on source. You don't want your file of notes to be mistaken for source, but you also don't want them to be lost. A useful trick is to give the file a precious name (e.g. +notes ).

junk Often when working on a project tree, it's convenient to create "throw-away" files. You might want to compile a quick test program or save, for the moment, the output of some command. When enough of these throw-away files have accumulated, it's handy to be able to get rid of them all-at-once, without having to carefully identify which files to toss, and which to keep. junk names are perfect for this. When you create one of these throw-away files, give it name like ,foo . Later, you can feel confident and safe issuing commands like:


        % rm ,*

        % find . -name ',*' | xargs rm

        % tla inventory --junk | xargs rm


From arch's perspective, junk files have two important properties. First, when copying a tree, the junk files are not copied. Second, it is considered safe for arch to overwrite a junk file. In practice, arch will only ever actually overwrite a junk file if that junk file has a name that begins with ,, .

backups Editor backup files and the backup files created by programs like patch often deserve special treatment. For example, if your editor creates "numbered backups", those are almost junk files, but rather than deleting all of them, you might want to delete only some of them.

For arch, what is important is that when copying a tree, backup files should not also be copied. For users, what is hopefully most useful is that using the trick:

        % tla inventory --junk | xargs rm

will not delete backup files.

unrecognized The appearance in a source tree of a file that doesn't fit any known pattern (or that has a suspicious name) most likely indicates that something has gone wrong. Rather than silently ignoring such files or treating them as precious or junk , arch explicitly flags these exceptions in order to be able to give warnings to users.

Overall, adopting file naming conventions is a discipline that many programmers may not be accustomed to, but it's one I strongly recommend. It's easy to stick to these conventions and tools like inventory and tree-lint (introduced later) help you to keep your source from get out of control.

arch Meets hello-world: A Tutorial Introduction to The arch Revision Control System
The Hackerlab at regexps.com