[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6 Choosing Files and Names for tar

Certain options to tar enable you to specify a name for your archive. Other options let you decide which files to include or exclude from the archive, based on when or whether files were modified, whether the file names do or don’t match specified patterns, or whether files are in specified directories.

This chapter discusses these options in detail.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.1 Choosing and Naming Archive Files

By default, tar uses an archive file name that was compiled when it was built on the system; usually this name refers to some physical tape drive on the machine. However, the person who installed tar on the system may not have set the default to a meaningful value as far as most users are concerned. As a result, you will usually want to tell tar where to find (or create) the archive. The ‘--file=archive-name’ (‘-f archive-name’) option allows you to either specify or name a file to use as the archive instead of the default archive file location.

--file=archive-name
-f archive-name

Name the archive to create or operate on. Use in conjunction with any operation.

For example, in this tar command,

$ tar -cvf collection.tar blues folk jazz

collection.tar’ is the name of the archive. It must directly follow the ‘-f’ option, since whatever directly follows ‘-fwill end up naming the archive. If you neglect to specify an archive name, you may end up overwriting a file in the working directory with the archive you create since tar will use this file’s name for the archive name.

An archive can be saved as a file in the file system, sent through a pipe or over a network, or written to an I/O device such as a tape, floppy disk, or CD write drive.

If you do not name the archive, tar uses the value of the environment variable TAPE as the file name for the archive. If that is not available, tar uses a default, compiled-in archive name, usually that for tape unit zero (i.e., ‘/dev/tu00’).

If you use ‘-’ as an archive-name, tar reads the archive from standard input (when listing or extracting files), or writes it to standard output (when creating an archive). If you use ‘-’ as an archive-name when modifying an archive, tar reads the original archive from its standard input and writes the entire new archive to its standard output.

The following example is a convenient way of copying directory hierarchy from ‘sourcedir’ to ‘targetdir’.

$ (cd sourcedir; tar -cf - .) | (cd targetdir; tar -xpf -)

The ‘-C’ option allows to avoid using subshells:

$ tar -C sourcedir -cf - . | tar -C targetdir -xpf -

In both examples above, the leftmost tar invocation archives the contents of ‘sourcedir’ to the standard output, while the rightmost one reads this archive from its standard input and extracts it. The ‘-p’ option tells it to restore permissions of the extracted files.

To specify an archive file on a device attached to a remote machine, use the following:

--file=hostname:/dev/file-name

tar will set up the remote connection, if possible, and prompt you for a username and password. If you use ‘--file=@hostname:/dev/file-name’, tar will attempt to set up the remote connection using your username as the username on the remote machine.

If the archive file name includes a colon (‘:’), then it is assumed to be a file on another machine. If the archive file is ‘user@host:file’, then file is used on the host host. The remote host is accessed using the rsh program, with a username of user. If the username is omitted (along with the ‘@’ sign), then your user name will be used. (This is the normal rsh behavior.) It is necessary for the remote machine, in addition to permitting your rsh access, to have the ‘rmt’ program installed (this command is included in the GNU tar distribution and by default is installed under ‘prefix/libexec/rmt’, where prefix means your installation prefix). If you need to use a file whose name includes a colon, then the remote tape drive behavior can be inhibited by using the ‘--force-local’ option.

When the archive is being created to ‘/dev/null’, GNU tar tries to minimize input and output operations. The Amanda backup system, when used with GNU tar, has an initial sizing pass which uses this feature.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.2 Selecting Archive Members

File Name arguments specify which files in the file system tar operates on, when creating or adding to an archive, or which archive members tar operates on, when reading or deleting from an archive. See section The Five Advanced tar Operations.

To specify file names, you can include them as the last arguments on the command line, as follows:

tar operation [option1 option2 …] [file name-1 file name-2 …]

If a file name begins with dash (‘-’), precede it with ‘--add-file’ option to prevent it from being treated as an option.

By default GNU tar attempts to unquote each file or member name, replacing escape sequences according to the following table:

EscapeReplaced with
\aAudible bell (ASCII 7)
\bBackspace (ASCII 8)
\fForm feed (ASCII 12)
\nNew line (ASCII 10)
\rCarriage return (ASCII 13)
\tHorizontal tabulation (ASCII 9)
\vVertical tabulation (ASCII 11)
\?ASCII 127
\nASCII n (n should be an octal number of up to 3 digits)

A backslash followed by any other symbol is retained.

This default behavior is controlled by the following command line option:

--unquote

Enable unquoting input file or member names (default).

--no-unquote

Disable unquoting input file or member names.

If you specify a directory name as a file name argument, all the files in that directory are operated on by tar.

If you do not specify files, tar behavior differs depending on the operation mode as described below:

When tar is invoked with ‘--create’ (‘-c’), tar will stop immediately, reporting the following:

$ tar cf a.tar
tar: Cowardly refusing to create an empty archive
Try 'tar --help' or 'tar --usage' for more information.

If you specify either ‘--list’ (‘-t’) or ‘--extract’ (‘--get’, ‘-x’), tar operates on all the archive members in the archive.

If run with ‘--diff’ option, tar will compare the archive with the contents of the current working directory.

If you specify any other operation, tar does nothing.

By default, tar takes file names from the command line. However, there are other ways to specify file or member names, or to modify the manner in which tar selects the files or members upon which to operate. In general, these methods work both for specifying the names of files and archive members.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.3 Reading Names from a File

Instead of giving the names of files or archive members on the command line, you can put the names into a file, and then use the ‘--files-from=file-of-names’ (‘-T file-of-names’) option to tar. Give the name of the file which contains the list of files to include as the argument to ‘--files-from’. In the list, the file names should be separated by newlines. You will frequently use this option when you have generated the list of files to archive with the find utility.

--files-from=file-name
-T file-name

Get names to extract or create from file file-name.

If you give a single dash as a file name for ‘--files-from’, (i.e., you specify either --files-from=- or -T -), then the file names are read from standard input.

Unless you are running tar with ‘--create’, you cannot use both --files-from=- and --file=- (-f -) in the same command.

Any number of ‘-T’ options can be given in the command line.

The following example shows how to use find to generate a list of files smaller than 400 blocks in length(15) and put that list into a file called ‘small-files’. You can then use the ‘-T’ option to tar to specify the files from that file, ‘small-files’, to create the archive ‘little.tgz’. (The ‘-z’ option to tar compresses the archive with gzip; see section Creating and Reading Compressed Archives for more information.)

$ find . -size -400 -print > small-files
$ tar -c -v -z -T small-files -f little.tgz

By default, each line read from the file list is first stripped off any leading and trailing whitespace. If the resulting string begins with ‘-’ character, it is considered a tar option and is processed accordingly(16). Only a subset of GNU tar options is allowed for use in file lists. For a list of such options, Position-Sensitive Options.

For example, the common use of this feature is to change to another directory by specifying ‘-C’ option:

$ cat list
-C/etc
passwd
hosts
-C/lib
libc.a
$ tar -c -f foo.tar --files-from list

In this example, tar will first switch to ‘/etc’ directory and add files ‘passwd’ and ‘hosts’ to the archive. Then it will change to ‘/lib’ directory and will archive the file ‘libc.a’. Thus, the resulting archive ‘foo.tar’ will contain:

$ tar tf foo.tar
passwd
hosts
libc.a

Note, that any options used in the file list remain in effect for the rest of the command line. For example, using the same ‘list’ file as above, the following command

$ tar -c -f foo.tar --files-from list libcurses.a

will look for file ‘libcurses.a’ in the directory ‘/lib’, because it was used with the last ‘-C’ option (see section Position-Sensitive Options).

If such option handling is undesirable, use the ‘--verbatim-files-from’ option. When this option is in effect, each line read from the file list is treated as a file name. Notice, that this means, in particular, that no whitespace trimming is performed.

The ‘--verbatim-files-from’ affects all ‘-T’ options that follow it in the command line. The default behavior can be restored using ‘--no-verbatim-files-from’ option.

To disable option handling for a single file name, use the ‘--add-file’ option, e.g.: --add-file=--my-file.

You can use any GNU tar command line options in the file list file, including ‘--files-from’ option itself. This allows for including contents of a file list into another file list file. Note however, that options that control file list processing, such as ‘--verbatim-files-from’ or ‘--null’ won’t affect the file they appear in. They will affect next ‘--files-from’ option, if there is any.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.3.1 NUL-Terminated File Names

The ‘--null’ option causes ‘--files-from=file-of-names’ (‘-T file-of-names’) to read file names terminated by a NUL instead of a newline, so files whose names contain newlines can be archived using ‘--files-from’.

--null

Only consider NUL-terminated file names, instead of files that terminate in a newline.

--no-null

Undo the effect of any previous ‘--null’ option.

The ‘--null’ option is just like the one in GNU xargs and cpio, and is useful with the ‘-print0’ predicate of GNU find. In tar, ‘--null’ also disables special handling for file names that begin with dash (similar to ‘--verbatim-files-from’ option).

This example shows how to use find to generate a list of files larger than 800 blocks in length and put that list into a file called ‘long-files’. The ‘-print0’ option to find is just like ‘-print’, except that it separates files with a NUL rather than with a newline. You can then run tar with both the ‘--null’ and ‘-T’ options to specify that tar gets the files from that file, ‘long-files’, to create the archive ‘big.tgz’. The ‘--null’ option to tar will cause tar to recognize the NUL separator between files.

$ find . -size +800 -print0 > long-files
$ tar -c -v --null --files-from=long-files --file=big.tar

The ‘--no-null’ option can be used if you need to read both NUL-terminated and newline-terminated files on the same command line. For example, if ‘flist’ is a newline-terminated file, then the following command can be used to combine it with the above command:

$ find . -size +800 -print0 |
  tar -c -f big.tar --null -T - --no-null -T flist

This example uses short options for typographic reasons, to avoid very long lines.

GNU tar is tries to automatically detect NUL-terminated file lists, so in many cases it is safe to use them even without the ‘--null’ option. In this case tar will print a warning and continue reading such a file as if ‘--null’ were actually given:

$ find . -size +800 -print0 | tar -c -f big.tar -T -
tar: -: file name read contains nul character

The null terminator, however, remains in effect only for this particular file, any following ‘-T’ options will assume newline termination. Of course, the null autodetection applies to these eventual surplus ‘-T’ options as well.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4 Excluding Some Files

To avoid operating on files whose names match a particular pattern, use the ‘--exclude’ or ‘--exclude-from’ options.

--exclude=pattern

Causes tar to ignore files that match the pattern.

The ‘--exclude=pattern’ option prevents any file or member whose name matches the shell wildcard (pattern) from being operated on. For example, to create an archive with all the contents of the directory ‘src’ except for files whose names end in ‘.o’, use the command ‘tar -cf src.tar --exclude='*.o' src’.

You may give multiple ‘--exclude’ options.

--exclude-from=file
-X file

Causes tar to ignore files that match the patterns listed in file.

Use the ‘--exclude-from’ option to read a list of patterns, one per line, from file; tar will ignore files matching those patterns. Thus if tar is called as ‘tar -c -X foo .’ and the file ‘foo’ contains a single line ‘*.o’, no files whose names end in ‘.o’ will be added to the archive.

Notice, that lines from file are read verbatim. One of the frequent errors is leaving some extra whitespace after a file name, which is difficult to catch using text editors.

However, empty lines are OK.

When archiving directories that are under some version control system (VCS), it is often convenient to read exclusion patterns from this VCS’ ignore files (e.g. ‘.cvsignore’, ‘.gitignore’, etc.) The following options provide such possibility:

--exclude-vcs-ignores

Before archiving a directory, see if it contains any of the following files: ‘cvsignore’, ‘.gitignore’, ‘.bzrignore’, or ‘.hgignore’. If so, read ignore patterns from these files.

The patterns are treated much as the corresponding VCS would treat them, i.e.:

.cvsignore

Contains shell-style globbing patterns that apply only to the directory where this file resides. No comments are allowed in the file. Empty lines are ignored.

.gitignore

Contains shell-style globbing patterns. Applies to the directory where ‘.gitfile’ is located and all its subdirectories.

Any line beginning with a ‘#’ is a comment. Backslash escapes the comment character.

.bzrignore

Contains shell globbing-patterns and regular expressions (if prefixed with ‘RE:(17). Patterns affect the directory and all its subdirectories.

Any line beginning with a ‘#’ is a comment.

.hgignore

Contains POSIX regular expressions(18). The line ‘syntax: glob’ switches to shell globbing patterns. The line ‘syntax: regexp’ switches back. Comments begin with a ‘#’. Patterns affect the directory and all its subdirectories.

--exclude-ignore=file

Before dumping a directory, tar checks if it contains file. If so, exclusion patterns are read from this file. The patterns affect only the directory itself.

--exclude-ignore-recursive=file

Same as ‘--exclude-ignore’, except that the patterns read affect both the directory where file resides and all its subdirectories.

--exclude-vcs

Exclude files and directories used by following version control systems: ‘CVS’, ‘RCS’, ‘SCCS’, ‘SVN’, ‘Arch’, ‘Bazaar’, ‘Mercurial’, and ‘Darcs’.

As of version 1.35, the following files are excluded:

--exclude-backups

Exclude backup and lock files. This option causes exclusion of files that match the following shell globbing patterns:

.#*
*~
#*#

When creating an archive, the ‘--exclude-caches’ option family causes tar to exclude all directories that contain a cache directory tag. A cache directory tag is a short file with the well-known name ‘CACHEDIR.TAG’ and having a standard header specified in http://www.brynosaurus.com/cachedir/spec.html. Various applications write cache directory tags into directories they use to hold regenerable, non-precious data, so that such data can be more easily excluded from backups.

There are three ‘exclude-caches’ options, each providing a different exclusion semantics:

--exclude-caches

Do not archive the contents of the directory, but archive the directory itself and the ‘CACHEDIR.TAG’ file.

--exclude-caches-under

Do not archive the contents of the directory, nor the ‘CACHEDIR.TAG’ file, archive only the directory itself.

--exclude-caches-all

Omit directories containing ‘CACHEDIR.TAG’ file entirely.

Another option family, ‘--exclude-tag’, provides a generalization of this concept. It takes a single argument, a file name to look for. Any directory that contains this file will be excluded from the dump. Similarly to ‘exclude-caches’, there are three options in this option family:

--exclude-tag=file

Do not dump the contents of the directory, but dump the directory itself and the file.

--exclude-tag-under=file

Do not dump the contents of the directory, nor the file, archive only the directory itself.

--exclude-tag-all=file

Omit directories containing file file entirely.

Multiple ‘--exclude-tag*’ options can be given.

For example, given this directory:

$ find dir
dir
dir/blues
dir/jazz
dir/folk
dir/folk/tagfile
dir/folk/sanjuan
dir/folk/trote

The ‘--exclude-tag’ will produce the following:

$ tar -cf archive.tar --exclude-tag=tagfile -v dir
dir/
dir/blues
dir/jazz
dir/folk/
tar: dir/folk/: contains a cache directory tag tagfile;
  contents not dumped
dir/folk/tagfile

Both the ‘dir/folk’ directory and its tagfile are preserved in the archive, however the rest of files in this directory are not.

Now, using the ‘--exclude-tag-under’ option will exclude ‘tagfile’ from the dump, while still preserving the directory itself, as shown in this example:

$ tar -cf archive.tar --exclude-tag-under=tagfile -v dir
dir/
dir/blues
dir/jazz
dir/folk/
./tar: dir/folk/: contains a cache directory tag tagfile;
  contents not dumped

Finally, using ‘--exclude-tag-all’ omits the ‘dir/folk’ directory entirely:

$ tar -cf archive.tar --exclude-tag-all=tagfile -v dir
dir/
dir/blues
dir/jazz
./tar: dir/folk/: contains a cache directory tag tagfile;
  directory not dumped

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

Problems with Using the exclude Options

Some users find ‘exclude’ options confusing. Here are some common pitfalls:


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.5 Wildcards Patterns and Matching

Globbing is the operation by which wildcard characters, ‘*’ or ‘?’ for example, are replaced and expanded into all existing files matching the given pattern. GNU tar can use wildcard patterns for matching (or globbing) archive members when extracting from or listing an archive. Wildcard patterns are also used for verifying volume labels of tar archives. This section has the purpose of explaining wildcard syntax for tar.

A pattern should be written according to shell syntax, using wildcard characters to effect globbing. Most characters in the pattern stand for themselves in the matched string, and case is significant: ‘a’ will match only ‘a’, and not ‘A’. The character ‘?’ in the pattern matches any single character in the matched string. The character ‘*’ in the pattern matches zero, one, or more single characters in the matched string. The character ‘\’ says to take the following character of the pattern literally; it is useful when one needs to match the ‘?’, ‘*’, ‘[’ or ‘\’ characters, themselves.

The character ‘[’, up to the matching ‘]’, introduces a character class. A character class is a list of acceptable characters for the next single character of the matched string. For example, ‘[abcde]’ would match any of the first five letters of the alphabet. Note that within a character class, all of the “special characters” listed above other than ‘\’ lose their special meaning; for example, ‘[-\\[*?]]’ would match any of the characters, ‘-’, ‘\’, ‘[’, ‘*’, ‘?’, or ‘]’. (Due to parsing constraints, the characters ‘-’ and ‘]’ must either come first or last in a character class.)

If the first character of the class after the opening ‘[’ is ‘!’ or ‘^’, then the meaning of the class is reversed. Rather than listing character to match, it lists those characters which are forbidden as the next single character of the matched string.

Other characters of the class stand for themselves. The special construction ‘[a-e]’, using an hyphen between two letters, is meant to represent all characters between a and e, inclusive.

Periods (‘.’) or forward slashes (‘/’) are not considered special for wildcard matches. However, if a pattern completely matches a directory prefix of a matched string, then it matches the full matched string: thus, excluding a directory also excludes all the files beneath it.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

Controlling Pattern-Matching

For the purposes of this section, we call exclusion members all member names obtained while processing ‘--exclude’ and ‘--exclude-from’ options, and inclusion members those member names that were given in the command line or read from the file specified with ‘--files-from’ option.

These two pairs of member lists are used in the following operations: ‘--diff’, ‘--extract’, ‘--list’, ‘--update’.

There are no inclusion members in create mode (‘--create’ and ‘--append’), since in this mode the names obtained from the command line refer to files, not archive members.

By default, inclusion members are compared with archive members literally (19) and exclusion members are treated as globbing patterns. For example:

$ tar tf foo.tar
a.c
b.c
a.txt
[remarks]
# Member names are used verbatim:
$ tar -xf foo.tar -v '[remarks]'
[remarks]
# Exclude member names are globbed:
$ tar -xf foo.tar -v --exclude '*.c'
a.txt
[remarks]

This behavior can be altered by using the following options:

--wildcards

Treat all member names as wildcards.

--no-wildcards

Treat all member names as literal strings.

Thus, to extract files whose names end in ‘.c’, you can use:

$ tar -xf foo.tar -v --wildcards '*.c'
a.c
b.c

Notice quoting of the pattern to prevent the shell from interpreting it.

The effect of ‘--wildcards’ option is canceled by ‘--no-wildcards’. This can be used to pass part of the command line arguments verbatim and other part as globbing patterns. For example, the following invocation:

$ tar -xf foo.tar --wildcards '*.txt' --no-wildcards '[remarks]'

instructs tar to extract from ‘foo.tar’ all files whose names end in ‘.txt’ and the file named ‘[remarks]’.

Normally, a pattern matches a name if an initial subsequence of the name’s components matches the pattern, where ‘*’, ‘?’, and ‘[...]’ are the usual shell wildcards, ‘\’ escapes wildcards, and wildcards can match ‘/’.

Other than optionally stripping leading ‘/’ from names (see section Absolute File Names), patterns and names are used as-is. For example, trailing ‘/’ is not trimmed from a user-specified name before deciding whether to exclude it.

However, this matching procedure can be altered by the options listed below. These options accumulate. For example:

--ignore-case --exclude='makefile' --no-ignore-case ---exclude='readme'

ignores case when excluding ‘makefile’, but not when excluding ‘readme’.

--anchored
--no-anchored

If anchored, a pattern must match an initial subsequence of the name’s components. Otherwise, the pattern can match any subsequence. Default is ‘--no-anchored’ for exclusion members and ‘--anchored’ inclusion members.

--ignore-case
--no-ignore-case

When ignoring case, upper-case patterns match lower-case names and vice versa. When not ignoring case (the default), matching is case-sensitive.

--wildcards-match-slash
--no-wildcards-match-slash

When wildcards match slash (the default for exclusion members), a wildcard like ‘*’ in the pattern can match a ‘/’ in the name. Otherwise, ‘/’ is matched only by ‘/’.

The ‘--recursion’ and ‘--no-recursion’ options (see section Descending into Directories) also affect how member patterns are interpreted. If recursion is in effect, a pattern matches a name if it matches any of the name’s parent directories.

The following table summarizes pattern-matching default values:

MembersDefault settings
Inclusion--no-wildcards --anchored --no-wildcards-match-slash
Exclusion--wildcards --no-anchored --wildcards-match-slash

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.6 Quoting Member Names

When displaying member names, tar takes care to avoid ambiguities caused by certain characters. This is called name quoting. The characters in question are:

The exact way tar uses to quote these characters depends on the quoting style. The default quoting style, called escape (see below), uses backslash notation to represent control characters and backslash.

GNU tar offers seven distinct quoting styles, which can be selected using ‘--quoting-style’ option:

--quoting-style=style

Sets quoting style. Valid values for style argument are: literal, shell, shell-always, c, escape, locale, clocale.

These styles are described in detail below. To illustrate their effect, we will use an imaginary tar archive ‘arch.tar’ containing the following members:

# 1. Contains horizontal tabulation character.
a       tab
# 2. Contains newline character
a
newline
# 3. Contains a space
a space
# 4. Contains double quotes
a"double"quote
# 5. Contains single quotes
a'single'quote
# 6. Contains a backslash character:
a\backslash

Here is how usual ls command would have listed them, if they had existed in the current working directory:

$ ls
a\ttab
a\nnewline
a\ space
a"double"quote
a'single'quote
a\\backslash

Quoting styles:

literal

No quoting, display each character as is:

$ tar tf arch.tar --quoting-style=literal
./
./a space
./a'single'quote
./a"double"quote
./a\backslash
./a     tab
./a
newline
shell

Display characters the same way Bourne shell does: control characters, except ‘\t’ and ‘\n’, are printed using backslash escapes, ‘\t’ and ‘\n’ are printed as is, and a single quote is printed as ‘\'’. If a name contains any quoted characters, it is enclosed in single quotes. In particular, if a name contains single quotes, it is printed as several single-quoted strings:

$ tar tf arch.tar --quoting-style=shell
./
'./a space'
'./a'\''single'\''quote'
'./a"double"quote'
'./a\backslash'
'./a    tab'
'./a
newline'
shell-always

Same as ‘shell’, but the names are always enclosed in single quotes:

$ tar tf arch.tar --quoting-style=shell-always
'./'
'./a space'
'./a'\''single'\''quote'
'./a"double"quote'
'./a\backslash'
'./a    tab'
'./a
newline'
c

Use the notation of the C programming language. All names are enclosed in double quotes. Control characters are quoted using backslash notations, double quotes are represented as ‘\"’, backslash characters are represented as ‘\\’. Single quotes and spaces are not quoted:

$ tar tf arch.tar --quoting-style=c
"./"
"./a space"
"./a'single'quote"
"./a\"double\"quote"
"./a\\backslash"
"./a\ttab"
"./a\nnewline"
escape

Control characters are printed using backslash notation, and a backslash as ‘\\’. This is the default quoting style, unless it was changed when configured the package.

$ tar tf arch.tar --quoting-style=escape
./
./a space
./a'single'quote
./a"double"quote
./a\\backslash
./a\ttab
./a\nnewline
locale

Control characters, single quote and backslash are printed using backslash notation. All names are quoted using left and right quotation marks, appropriate to the current locale. If it does not define quotation marks, use ‘'’ as left and as right quotation marks. Any occurrences of the right quotation mark in a name are escaped with ‘\’, for example:

For example:

$ tar tf arch.tar --quoting-style=locale
'./'
'./a space'
'./a\'single\'quote'
'./a"double"quote'
'./a\\backslash'
'./a\ttab'
'./a\nnewline'
clocale

Same as ‘locale’, but ‘"’ is used for both left and right quotation marks, if not provided by the currently selected locale:

$ tar tf arch.tar --quoting-style=clocale
"./"
"./a space"
"./a'single'quote"
"./a\"double\"quote"
"./a\\backslash"
"./a\ttab"
"./a\nnewline"

You can specify which characters should be quoted in addition to those implied by the current quoting style:

--quote-chars=string

Always quote characters from string, even if the selected quoting style would not quote them.

For example, using ‘escape’ quoting (compare with the usual escape listing above):

$ tar tf arch.tar --quoting-style=escape --quote-chars=' "'
./
./a\ space
./a'single'quote
./a\"double\"quote
./a\\backslash
./a\ttab
./a\nnewline

To disable quoting of such additional characters, use the following option:

--no-quote-chars=string

Remove characters listed in string from the list of quoted characters set by the previous ‘--quote-chars’ option.

This option is particularly useful if you have added ‘--quote-chars’ to your TAR_OPTIONS (see TAR_OPTIONS) and wish to disable it for the current invocation.

Note, that ‘--no-quote-chars’ does not disable those characters that are quoted by default in the selected quoting style.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.7 Modifying File and Member Names

Tar archives contain detailed information about files stored in them and full file names are part of that information. When storing a file to an archive, its file name is recorded in it, along with the actual file contents. When restoring from an archive, a file is created on disk with exactly the same name as that stored in the archive. In the majority of cases this is the desired behavior of a file archiver. However, there are some cases when it is not.

First of all, it is often unsafe to extract archive members with absolute file names or those that begin with a ‘../’. GNU tar takes special precautions when extracting such names and provides a special option for handling them, which is described in Absolute File Names.

Secondly, you may wish to extract file names without some leading directory components, or with otherwise modified names. In other cases it is desirable to store files under differing names in the archive.

GNU tar provides several options for these needs.

--strip-components=number

Strip given number of leading components from file names before extraction.

For example, suppose you have archived whole ‘/usr’ hierarchy to a tar archive named ‘usr.tar’. Among other files, this archive contains ‘usr/include/stdlib.h’, which you wish to extract to the current working directory. To do so, you type:

$ tar -xf usr.tar --strip=2 usr/include/stdlib.h

The option ‘--strip=2’ instructs tar to strip the two leading components (‘usr/’ and ‘include/’) off the file name.

If you add the ‘--verbose’ (‘-v’) option to the invocation above, you will note that the verbose listing still contains the full file name, with the two removed components still in place. This can be inconvenient, so tar provides a special option for altering this behavior:

--show-transformed-names

Display file or member names with all requested transformations applied.

For example:

$ tar -xf usr.tar -v --strip=2 usr/include/stdlib.h
usr/include/stdlib.h
$ tar -xf usr.tar -v --strip=2 --show-transformed usr/include/stdlib.h
stdlib.h

Notice that in both cases the file ‘stdlib.h’ is extracted to the current working directory, ‘--show-transformed-names’ affects only the way its name is displayed.

This option is especially useful for verifying whether the invocation will have the desired effect. Thus, before running

$ tar -x --strip=n

it is often advisable to run

$ tar -t -v --show-transformed --strip=n

to make sure the command will produce the intended results.

In case you need to apply more complex modifications to the file name, GNU tar provides a general-purpose transformation option:

--transform=expression
--xform=expression

Modify file names using supplied expression.

The expression is a sed-like replace expression of the form:

s/regexp/replace/[flags]

where regexp is a regular expression, replace is a replacement for each file name part that matches regexp. Both regexp and replace are described in detail in The ‘s’ Command in GNU sed.

Any delimiter can be used in lieu of ‘/’, the only requirement being that it be used consistently throughout the expression. For example, the following two expressions are equivalent:

s/one/two/
s,one,two,

Changing delimiters is often useful when the regex contains slashes. For example, it is more convenient to write s,/,-, than s/\//-/.

As in sed, you can give several replace expressions, separated by a semicolon.

Supported flags are:

g

Apply the replacement to all matches to the regexp, not just the first.

i

Use case-insensitive matching.

x

regexp is an extended regular expression (see Extended regular expressions in GNU sed).

number

Only replace the numberth match of the regexp.

Note: the POSIX standard does not specify what should happen when you mix the ‘g’ and number modifiers. GNU tar follows the GNU sed implementation in this regard, so the interaction is defined to be: ignore matches before the numberth, and then match and replace all matches from the numberth on.

In addition, several transformation scope flags are supported, that control to what files transformations apply. These are:

r

Apply transformation to regular archive members.

R

Do not apply transformation to regular archive members.

s

Apply transformation to symbolic link targets.

S

Do not apply transformation to symbolic link targets.

h

Apply transformation to hard link targets.

H

Do not apply transformation to hard link targets.

Default is ‘rsh’, which means to apply transformations to both archive members and targets of symbolic and hard links.

Default scope flags can also be changed using ‘flags=’ statement in the transform expression. The flags set this way remain in force until next ‘flags=’ statement or end of expression, whichever occurs first. For example:

  --transform 'flags=S;s|^|/usr/local/|'

Here are several examples of ‘--transform’ usage:

  1. Extract ‘usr/’ hierarchy into ‘usr/local/’:
    $ tar --transform='s,usr/,usr/local/,' -x -f arch.tar
    
  2. Strip two leading directory components (equivalent to ‘--strip-components=2’):
    $ tar --transform='s,/*[^/]*/[^/]*/,,' -x -f arch.tar
    
  3. Convert each file name to lower case:
    $ tar --transform 's/.*/\L&/' -x -f arch.tar
    
  4. Prepend ‘/prefix/’ to each file name:
    $ tar --transform 's,^,/prefix/,' -x -f arch.tar
    
  5. Archive the ‘/lib’ directory, prepending ‘/usr/local’ to each archive member:
    $ tar --transform 's,^,/usr/local/,S' -c -f arch.tar /lib
    

Notice the use of flags in the last example. The ‘/lib’ directory often contains many symbolic links to files within it. It may look, for example, like this:

$ ls -l
drwxr-xr-x root/root       0 2008-07-08 16:20 /lib/
-rwxr-xr-x root/root 1250840 2008-05-25 07:44 /lib/libc-2.3.2.so
lrwxrwxrwx root/root       0 2008-06-24 17:12 /lib/libc.so.6 -> libc-2.3.2.so
...

Using the expression ‘s,^,/usr/local/,’ would mean adding ‘/usr/local’ to both regular archive members and to link targets. In this case, ‘/lib/libc.so.6’ would become:

  /usr/local/lib/libc.so.6 -> /usr/local/libc-2.3.2.so

This is definitely not desired. To avoid this, the ‘S’ flag is used, which excludes symbolic link targets from filename transformations. The result is:

$ tar --transform 's,^,/usr/local/,S' -c -v -f arch.tar \
       --show-transformed /lib
drwxr-xr-x root/root       0 2008-07-08 16:20 /usr/local/lib/
-rwxr-xr-x root/root 1250840 2008-05-25 07:44 /usr/local/lib/libc-2.3.2.so
lrwxrwxrwx root/root       0 2008-06-24 17:12 /usr/local/lib/libc.so.6 \
 -> libc-2.3.2.so

Unlike ‘--strip-components’, ‘--transform’ can be used in any GNU tar operation mode. For example, the following command adds files to the archive while replacing the leading ‘usr/’ component with ‘var/’:

$ tar -cf arch.tar --transform='s,^usr/,var/,' /

To test ‘--transform’ effect we suggest using ‘--show-transformed-names’ option:

$ tar -cf arch.tar --transform='s,^usr/,var/,' \
       --verbose --show-transformed-names /

If both ‘--strip-components’ and ‘--transform’ are used together, then ‘--transform’ is applied first, and the required number of components is then stripped from its result.

You can use as many ‘--transform’ options in a single command line as you want. The specified expressions will then be applied in order of their appearance. For example, the following two invocations are equivalent:

$ tar -cf arch.tar --transform='s,/usr/var,/var/' \
                        --transform='s,/usr/local,/usr/,'
$ tar -cf arch.tar \
               --transform='s,/usr/var,/var/;s,/usr/local,/usr/,'

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.8 Operating Only on New Files

The ‘--after-date=date’ (‘--newer=date’, ‘-N date’) option causes tar to only work on files whose data modification or status change times are newer than the date given. If date starts with ‘/’ or ‘.’, it is taken to be a file name; the data modification time of that file is used as the date. If you use this option when creating or appending to an archive, the archive will only include new files. If you use ‘--after-date’ when extracting an archive, tar will only extract files newer than the date you specify.

If you want tar to make the date comparison based only on modification of the file’s data (rather than status changes), then use the ‘--newer-mtime=date’ option.

You may use these options with any operation. Note that these options differ from the ‘--update’ (‘-u’) operation in that they allow you to specify a particular date against which tar can compare when deciding whether or not to archive the files.

--after-date=date
--newer=date
-N date

Only store files newer than date.

Acts on files only if their data modification or status change times are later than date. Use in conjunction with any operation.

If date starts with ‘/’ or ‘.’, it is taken to be a file name; the data modification time of that file is used as the date.

--newer-mtime=date

Act like ‘--after-date’, but look only at data modification times.

These options limit tar to operate only on files which have been modified after the date specified. A file’s status is considered to have changed if its contents have been modified, or if its owner, permissions, and so forth, have been changed. (For more information on how to specify a date, see Date input formats; remember that the entire date argument must be quoted if it contains any spaces.)

Gurus would say that ‘--after-date’ tests both the data modification time (mtime, the time the contents of the file were last modified) and the status change time (ctime, the time the file’s status was last changed: owner, permissions, etc.) fields, while ‘--newer-mtime’ tests only the mtime field.

To be precise, ‘--after-date’ checks both mtime and ctime and processes the file if either one is more recent than date, while ‘--newer-mtime’ checks only mtime and disregards ctime. Neither option uses atime (the last time the contents of the file were looked at).

Date specifiers can have embedded spaces. Because of this, you may need to quote date arguments to keep the shell from parsing them as separate arguments. For example, the following command will add to the archive all the files modified less than two days ago:

$ tar -cf foo.tar --newer-mtime '2 days ago'

When any of these options is used with the option ‘--verbose’ (see section The ‘--verbose’ Option) GNU tar converts the specified date back to a textual form and compares that with the one given with the option. If the two forms differ, tar prints both forms in a message, to help the user check that the right date is being used. For example:

$ tar -c -f archive.tar --after-date='10 days ago' .
tar: Option --after-date: Treating date '10 days ago' as 2006-06-11
13:19:37.232434

Please Note:--after-date’ and ‘--newer-mtime’ should not be used for incremental backups. See section Using tar to Perform Incremental Dumps, for proper way of creating incremental backups.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.9 Descending into Directories

Usually, tar will recursively explore all directories (either those given on the command line or through the ‘--files-from’ option) for the various files they contain. However, you may not always want tar to act this way.

The ‘--no-recursion’ option inhibits tar’s recursive descent into specified directories. If you specify ‘--no-recursion’, you can use the find (see find in GNU Find Manual) utility for hunting through levels of directories to construct a list of file names which you could then pass to tar. find allows you to be more selective when choosing which files to archive; see Reading Names from a File, for more information on using find with tar.

--no-recursion

Prevents tar from recursively descending directories.

--recursion

Requires tar to recursively descend directories. This is the default.

When you use ‘--no-recursion’, GNU tar grabs directory entries themselves, but does not descend on them recursively. Many people use find for locating files they want to back up, and since tar usually recursively descends on directories, they have to use the ‘-not -type d’ test in their find invocation (see Type test in Finding Files), as they usually do not want all the files in a directory. They then use the ‘--files-from’ option to archive the files located via find.

The problem when restoring files archived in this manner is that the directories themselves are not in the archive; so the ‘--same-permissions’ (‘--preserve-permissions’, ‘-p’) option does not affect them—while users might really like it to. Specifying ‘--no-recursion’ is a way to tell tar to grab only the directory entries given to it, adding no new files on its own. To summarize, if you use find to create a list of files to be stored in an archive, use it as follows:

$ find dir tests | \
  tar -cf archive --no-recursion -T -

The ‘--no-recursion’ option also applies when extracting: it causes tar to extract only the matched directory entries, not the files under those directories.

The ‘--no-recursion’ option also affects how globbing patterns are interpreted (see section Controlling Pattern-Matching).

The ‘--no-recursion’ and ‘--recursion’ options apply to later options and operands, and can be overridden by later occurrences of ‘--no-recursion’ and ‘--recursion’. For example:

$ tar -cf jams.tar --no-recursion grape --recursion grape/concord

creates an archive with one entry for ‘grape’, and the recursive contents of ‘grape/concord’, but no entries under ‘grape’ other than ‘grape/concord’.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.10 Crossing File System Boundaries

tar will normally automatically cross file system boundaries in order to archive files which are part of a directory tree. You can change this behavior by running tar and specifying ‘--one-file-system’. This option only affects files that are archived because they are in a directory that is being archived; tar will still archive files explicitly named on the command line or through ‘--files-from’, regardless of where they reside.

--one-file-system

Prevents tar from crossing file system boundaries when archiving. Use in conjunction with any write operation.

The ‘--one-file-system’ option causes tar to modify its normal behavior in archiving the contents of directories. If a file in a directory is not on the same file system as the directory itself, then tar will not archive that file. If the file is a directory itself, tar will not archive anything beneath it; in other words, tar will not cross mount points.

This option is useful for making full or incremental archival backups of a file system. If this option is used in conjunction with ‘--verbose’ (‘-v’), files that are excluded are mentioned by name on the standard error.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.10.1 Changing the Working Directory

To change the working directory in the middle of a list of file names, either on the command line or in a file specified using ‘--files-from’ (‘-T’), use ‘--directory’ (‘-C’). This will change the working directory to the specified directory after that point in the list.

--directory=directory
-C directory

Changes the working directory in the middle of a command line.

For example,

$ tar -c -f jams.tar grape prune -C food cherry

will place the files ‘grape’ and ‘prune’ from the current directory into the archive ‘jams.tar’, followed by the file ‘cherry’ from the directory ‘food’. This option is especially useful when you have several widely separated files that you want to store in the same archive.

Note that the file ‘cherry’ is recorded in the archive under the precise name ‘cherry’, notfood/cherry’. Thus, the archive will contain three files that all appear to have come from the same directory; if the archive is extracted with plain ‘tar --extract’, all three files will be written in the current directory.

Contrast this with the command,

$ tar -c -f jams.tar grape prune -C food red/cherry

which records the third file in the archive under the name ‘red/cherry’ so that, if the archive is extracted using ‘tar --extract’, the third file will be written in a subdirectory named ‘red’.

You can use the ‘--directory’ option to make the archive independent of the original name of the directory holding the files. The following command places the files ‘/etc/passwd’, ‘/etc/hosts’, and ‘/lib/libc.a’ into the archive ‘foo.tar’:

$ tar -c -f foo.tar -C /etc passwd hosts -C /lib libc.a

However, the names of the archive members will be exactly what they were on the command line: ‘passwd’, ‘hosts’, and ‘libc.a’. They will not appear to be related by file name to the original directories where those files were located.

Note that ‘--directory’ options are interpreted consecutively. If ‘--directory’ specifies a relative file name, it is interpreted relative to the then current directory, which might not be the same as the original current working directory of tar, due to a previous ‘--directory’ option.

When using ‘--files-from’ (see section Reading Names from a File), you can put various tar options (including ‘-C’) in the file list. Notice, however, that in this case the option and its argument may not be separated by whitespace. If you use short option, its argument must either follow the option letter immediately, without any intervening whitespace, or occupy the next line. Otherwise, if you use long option, separate its argument by an equal sign.

For instance, the file list for the above example will be:

-C/etc
passwd
hosts
--directory=/lib
libc.a

To use it, you would invoke tar as follows:

$ tar -c -f foo.tar --files-from list

The interpretation of options in file lists is disabled by ‘--verbatim-files-from’ and ‘--null’ options.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.10.2 Absolute File Names

By default, GNU tar drops a leading ‘/’ on input or output, and complains about file names containing a ‘..’ component. There is an option that turns off this behavior:

--absolute-names
-P

Do not strip leading slashes from file names, and permit file names containing a ‘..’ file name component.

When tar extracts archive members from an archive, it strips any leading slashes (‘/’) from the member name. This causes absolute member names in the archive to be treated as relative file names. This allows you to have such members extracted wherever you want, instead of being restricted to extracting the member in the exact directory named in the archive. For example, if the archive member has the name ‘/etc/passwd’, tar will extract it as if the name were really ‘etc/passwd’.

File names containing ‘..’ can cause problems when extracting, so tar normally warns you about such files when creating an archive, and rejects attempts to extracts such files.

Other tar programs do not do this. As a result, if you create an archive whose member names start with a slash, they will be difficult for other people with a non-GNU tar program to use. Therefore, GNU tar also strips leading slashes from member names when putting members into the archive. For example, if you ask tar to add the file ‘/bin/ls’ to an archive, it will do so, but the member name will be ‘bin/ls(20).

Symbolic links containing ‘..’ or leading ‘/’ can also cause problems when extracting, so tar normally extracts them last; it may create empty files as placeholders during extraction.

If you use the ‘--absolute-names’ (‘-P’) option, tar will do none of these transformations.

To archive or extract files relative to the root directory, specify the ‘--absolute-names’ (‘-P’) option.

Normally, tar acts on files relative to the working directory—ignoring superior directory names when archiving, and ignoring leading slashes when extracting.

When you specify ‘--absolute-names’ (‘-P’), tar stores file names including all superior directory names, and preserves leading slashes. If you only invoked tar from the root directory you would never need the ‘--absolute-names’ option, but using this option may be more convenient than switching to root.

--absolute-names

Preserves full file names (including superior directory names) when archiving and extracting files.

tar prints out a message about removing the ‘/’ from file names. This message appears once per GNU tar invocation. It represents something which ought to be told; ignoring what it means can cause very serious surprises, later.

Some people, nevertheless, do not want to see this message. Wanting to play really dangerously, one may of course redirect tar standard error to the sink. For example, under sh:

$ tar -c -f archive.tar /home 2> /dev/null

Another solution, both nicer and simpler, would be to change to the ‘/’ directory first, and then avoid absolute notation. For example:

$ tar -c -f archive.tar -C / home

See section Integrity, for some of the security-related implications of using this option.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated on August 23, 2023 using texi2html 5.0.