GNU tar 1.35: 4.4 Options Used by --extract

4.4 Options Used by ‘`--extract`’

The previous chapter showed how to use ‘--extract’ to extract an archive into the file system. Various options cause tar to extract more information than just file contents, such as the owner, the permissions, the modification date, and so forth. This section presents options to be used with ‘--extract’ when certain special considerations arise. You may review the information presented in How to Extract Members from an Archive for more basic information about the ‘--extract’ operation.

4.4.1 Options to Help Read Archives

Normally, tar will request data in full record increments from an archive storage device. If the device cannot return a full record, tar will report an error. However, some devices do not always return full records, or do not require the last record of an archive to be padded out to the next record boundary. To keep reading until you obtain a full record, or to accept an incomplete record if it contains an end-of-archive marker, specify the ‘--read-full-records’ (‘-B’) option in conjunction with the ‘--extract’ or ‘--list’ operations. See section Blocking.

The ‘--read-full-records’ (‘-B’) option is turned on by default when tar reads an archive from standard input, or from a remote machine. This is because on BSD Unix systems, attempting to read a pipe returns however much happens to be in the pipe, even if it is less than was requested. If this option were not enabled, tar would fail as soon as it read an incomplete record from the pipe.

If you’re not sure of the blocking factor of an archive, you can read the archive by specifying ‘--read-full-records’ (‘-B’) and ‘--blocking-factor=512-size’ (‘-b 512-size’), using a blocking factor larger than what the archive uses. This lets you avoid having to determine the blocking factor of an archive. See section The Blocking Factor of an Archive.

Reading Full Records

‘--read-full-records’
‘-B’: Use in conjunction with ‘--extract’ (‘--get’, ‘-x’) to read an archive which contains incomplete records, or one which has a blocking factor less than the one specified.

Ignoring Blocks of Zeros

Normally, tar stops reading when it encounters a block of zeros between file entries (which usually indicates the end of the archive). ‘--ignore-zeros’ (‘-i’) allows tar to completely read an archive which contains a block of zeros before the end (i.e., a damaged archive, or one that was created by concatenating several archives together). This option also suppresses warnings about missing or incomplete zero blocks at the end of the archive. This can be turned on, if the need be, using the ‘--warning=alone-zero-block --warning=missing-zero-blocks’ options (see section Controlling Warning Messages).

The ‘--ignore-zeros’ (‘-i’) option is turned off by default because many versions of tar write garbage after the end-of-archive entry, since that part of the media is never supposed to be read. GNU tar does not write after the end of an archive, but seeks to maintain compatibility among archiving utilities.

‘--ignore-zeros’
‘-i’: To ignore blocks of zeros (i.e., end-of-archive entries) which may be encountered while reading an archive. Use in conjunction with ‘--extract’ or ‘--list’.

4.4.2 Changing How `tar` Writes Files

(This message will disappear, once this node revised.)

Options Controlling the Overwriting of Existing Files

When extracting files, if tar discovers that the extracted file already exists, it normally replaces the file by removing it before extracting it, to prevent confusion in the presence of hard or symbolic links. (If the existing file is a symbolic link, it is removed, not followed.) However, if a directory cannot be removed because it is nonempty, tar normally overwrites its metadata (ownership, permission, etc.). The ‘--overwrite-dir’ option enables this default behavior. To be more cautious and preserve the metadata of such a directory, use the ‘--no-overwrite-dir’ option.

To be even more cautious and prevent existing files from being replaced, use the ‘--keep-old-files’ (‘-k’) option. It causes tar to refuse to replace or update a file that already exists, i.e., a file with the same name as an archive member prevents extraction of that archive member. Instead, it reports an error. For example:

$ ls
blues
$ tar -x -k -f archive.tar
tar: blues: Cannot open: File exists
tar: Exiting with failure status due to previous errors

If you wish to preserve old files untouched, but don’t want tar to treat them as errors, use the ‘--skip-old-files’ option. This option causes tar to silently skip extracting over existing files.

To be more aggressive about altering existing files, use the ‘--overwrite’ option. It causes tar to overwrite existing files and to follow existing symbolic links when extracting.

Some people argue that GNU tar should not hesitate to overwrite files with other files when extracting. When extracting a tar archive, they expect to see a faithful copy of the state of the file system when the archive was created. It is debatable that this would always be a proper behavior. For example, suppose one has an archive in which ‘usr/local’ is a link to ‘usr/local2’. Since then, maybe the site removed the link and renamed the whole hierarchy from ‘/usr/local2’ to ‘/usr/local’. Such things happen all the time. I guess it would not be welcome at all that GNU tar removes the whole hierarchy just to make room for the link to be reinstated (unless it also simultaneously restores the full ‘/usr/local2’, of course!) GNU tar is indeed able to remove a whole hierarchy to reestablish a symbolic link, for example, but only if ‘--recursive-unlink’ is specified to allow this behavior. In any case, single files are silently removed.

Finally, the ‘--unlink-first’ (‘-U’) option can improve performance in some cases by causing tar to remove files unconditionally before extracting them.

Overwrite Old Files

‘--overwrite’

Overwrite existing files and directory metadata when extracting files from an archive.

This causes tar to write extracted files into the file system without regard to the files already on the system; i.e., files with the same names as archive members are overwritten when the archive is extracted. It also causes tar to extract the ownership, permissions, and time stamps onto any preexisting files or directories. If the name of a corresponding file name is a symbolic link, the file pointed to by the symbolic link will be overwritten instead of the symbolic link itself (if this is possible). Moreover, special devices, empty directories and even symbolic links are automatically removed if they are in the way of extraction.

Be careful when using the ‘--overwrite’ option, particularly when combined with the ‘--absolute-names’ (‘-P’) option, as this combination can change the contents, ownership or permissions of any file on your system. Also, many systems do not take kindly to overwriting files that are currently being executed.

‘--overwrite-dir’

Overwrite the metadata of directories when extracting files from an archive, but remove other files before extracting.

Keep Old Files

GNU tar provides two options to control its actions in a situation when it is about to extract a file which already exists on disk.

‘--keep-old-files’

‘-k’

Do not replace existing files from archive. When such a file is encountered, tar issues an error message. Upon end of extraction, tar exits with code 2 (see exit status).

‘--skip-old-files’

Do not replace existing files from archive, but do not treat that as error. Such files are silently skipped and do not affect tar exit status.

Additional verbosity can be obtained using ‘--warning=existing-file’ together with that option (see section Controlling Warning Messages).

Keep Newer Files

‘--keep-newer-files’: Do not replace existing files that are newer than their archive copies. This option is meaningless with ‘--list’ (‘-t’).

Unlink First

‘--unlink-first’
‘-U’: Remove files before extracting over them. This can make tar run a bit faster if you know in advance that the extracted files all need to be removed. Normally this option slows tar down slightly, so it is disabled by default.

Recursive Unlink

‘--recursive-unlink’: When this option is specified, try removing files and directory hierarchies before extracting over them. This is a dangerous option!

If you specify the ‘--recursive-unlink’ option, tar removes anything that keeps you from extracting a file as far as current permissions will allow it. This could include removal of the contents of a full directory hierarchy.

Setting Data Modification Times

Normally, tar sets the data modification times of extracted files to the corresponding times recorded for the files in the archive, but limits the permissions of extracted files by the current umask setting.

To set the data modification times of extracted files to the time when the files were extracted, use the ‘--touch’ (‘-m’) option in conjunction with ‘--extract’ (‘--get’, ‘-x’).

‘--touch’
‘-m’: Sets the data modification time of extracted archive members to the time they were extracted, not the time recorded for them in the archive. Use in conjunction with ‘--extract’ (‘--get’, ‘-x’).

Setting Access Permissions

To set the modes (access permissions) of extracted files to those recorded for those files in the archive, use ‘--same-permissions’ in conjunction with the ‘--extract’ (‘--get’, ‘-x’) operation.

‘--preserve-permissions’
‘--same-permissions’
‘-p’: Set modes of extracted archive members to those recorded in the archive, instead of current umask settings. Use in conjunction with ‘--extract’ (‘--get’, ‘-x’).

Directory Modification Times and Permissions

After successfully extracting a file member, GNU tar normally restores its permissions and modification times, as described in the previous sections. This cannot be done for directories, because after extracting a directory tar will almost certainly extract files into that directory and this will cause the directory modification time to be updated. Moreover, restoring that directory permissions may not permit file creation within it. Thus, restoring directory permissions and modification times must be delayed at least until all files have been extracted into that directory. GNU tar restores directories using the following approach.

The extracted directories are created with the mode specified in the archive, as modified by the umask of the user, which gives sufficient permissions to allow file creation. The meta-information about the directory is recorded in the temporary list of directories. When preparing to extract next archive member, GNU tar checks if the directory prefix of this file contains the remembered directory. If it does not, the program assumes that all files have been extracted into that directory, restores its modification time and permissions and removes its entry from the internal list. This approach allows to correctly restore directory meta-information in the majority of cases, while keeping memory requirements sufficiently small. It is based on the fact, that most tar archives use the predefined order of members: first the directory, then all the files and subdirectories in that directory.

However, this is not always true. The most important exception are incremental archives (see section Using tar to Perform Incremental Dumps). The member order in an incremental archive is reversed: first all directory members are stored, followed by other (non-directory) members. So, when extracting from incremental archives, GNU tar alters the above procedure. It remembers all restored directories, and restores their meta-data only after the entire archive has been processed. Notice, that you do not need to specify any special options for that, as GNU tar automatically detects archives in incremental format.

There may be cases, when such processing is required for normal archives too. Consider the following example:

$ tar --no-recursion -cvf archive \
    foo foo/file1 bar bar/file foo/file2
foo/
foo/file1
bar/
bar/file
foo/file2

During the normal operation, after encountering ‘bar’ GNU tar will assume that all files from the directory ‘foo’ were already extracted and will therefore restore its timestamp and permission bits. However, after extracting ‘foo/file2’ the directory timestamp will be offset again.

To correctly restore directory meta-information in such cases, use the ‘--delay-directory-restore’ command line option:

‘--delay-directory-restore’: Delays restoring of the modification times and permissions of extracted directories until the end of extraction. This way, correct meta-information is restored even if the archive has unusual member ordering.
‘--no-delay-directory-restore’: Cancel the effect of the previous ‘--delay-directory-restore’. Use this option if you have used ‘--delay-directory-restore’ in TAR_OPTIONS variable (see TAR_OPTIONS) and wish to temporarily disable it.

Writing to Standard Output

To write the extracted files to the standard output, instead of creating the files on the file system, use ‘--to-stdout’ (‘-O’) in conjunction with ‘--extract’ (‘--get’, ‘-x’). This option is useful if you are extracting files to send them through a pipe, and do not need to preserve them in the file system. If you extract multiple members, they appear on standard output concatenated, in the order they are found in the archive.

‘--to-stdout’
‘-O’: Writes files to the standard output. Use only in conjunction with ‘--extract’ (‘--get’, ‘-x’). When this option is used, instead of creating the files specified, tar writes the contents of the files extracted to its standard output. This may be useful if you are only extracting the files in order to send them through a pipe. This option is meaningless with ‘--list’ (‘-t’).

This can be useful, for example, if you have a tar archive containing a big file and don’t want to store the file on disk before processing it. You can use a command like this:

tar -xOzf foo.tgz bigfile | process

or even like this if you want to process the concatenation of the files:

tar -xOzf foo.tgz bigfile1 bigfile2 | process

However, ‘--to-command’ may be more convenient for use with multiple files. See the next section.

Writing to an External Program

You can instruct tar to send the contents of each extracted file to the standard input of an external program:

‘--to-command=command’

Extract files and pipe their contents to the standard input of command. When this option is used, instead of creating the files specified, tar invokes command and pipes the contents of the files to its standard output. The command may contain command line arguments (see Running External Commands, for more detail).

Notice, that command is executed once for each regular file extracted. Non-regular files (directories, etc.) are ignored when this option is used.

The command can obtain the information about the file it processes from the following environment variables:

TAR_FILETYPE

Type of the file. It is a single letter with the following meaning:

f	Regular file
d	Directory
l	Symbolic link
h	Hard link
b	Block device
c	Character device

Currently only regular files are supported.

TAR_MODE

File mode, an octal number.

TAR_FILENAME

The name of the file.

TAR_REALNAME

Name of the file as stored in the archive.

TAR_UNAME

Name of the file owner.

TAR_GNAME

Name of the file owner group.

TAR_ATIME

Time of last access. It is a decimal number, representing seconds since the Epoch. If the archive provides times with nanosecond precision, the nanoseconds are appended to the timestamp after a decimal point.

TAR_MTIME

Time of last modification.

TAR_CTIME

Time of last status change.

TAR_SIZE

Size of the file.

TAR_UID

UID of the file owner.

TAR_GID

GID of the file owner.

Additionally, the following variables contain information about tar mode and the archive being processed:

TAR_VERSION: GNU tar version number.
TAR_ARCHIVE: The name of the archive tar is processing.
TAR_BLOCKING_FACTOR: Current blocking factor (see section Blocking).
TAR_VOLUME: Ordinal number of the volume tar is processing.
TAR_FORMAT: Format of the archive being processed. See section Controlling the Archive Format, for a complete list of archive format names.

These variables are defined prior to executing the command, so you can pass them as arguments, if you prefer. For example, if the command proc takes the member name and size as its arguments, then you could do:

$ tar -x -f archive.tar \
       --to-command='proc $TAR_FILENAME $TAR_SIZE'

Notice single quotes to prevent variable names from being expanded by the shell when invoking tar.

If command exits with a non-0 status, tar will print an error message similar to the following:

tar: 2345: Child returned status 1

Here, ‘2345’ is the PID of the finished process.

If this behavior is not wanted, use ‘--ignore-command-error’:

‘--ignore-command-error’: Ignore exit codes of subprocesses. Notice that if the program exits on signal or otherwise terminates abnormally, the error message will be printed even if this option is used.
‘--no-ignore-command-error’: Cancel the effect of any previous ‘--ignore-command-error’ option. This option is useful if you have set ‘--ignore-command-error’ in TAR_OPTIONS (see TAR_OPTIONS) and wish to temporarily cancel it.

Removing Files

‘--remove-files’: Remove files after adding them to the archive.

4.4.3 Coping with Scarce Resources

(This message will disappear, once this node revised.)

Starting File

‘--starting-file=name’
‘-K name’: Starts an operation in the middle of an archive. Use in conjunction with ‘--extract’ (‘--get’, ‘-x’) or ‘--list’ (‘-t’).

If a previous attempt to extract files failed due to lack of disk space, you can use ‘--starting-file=name’ (‘-K name’) to start extracting only after member name of the archive. This assumes, of course, that there is now free space, or that you are now extracting into a different file system. (You could also choose to suspend tar, remove unnecessary files from the file system, and then resume the same tar operation. In this case, ‘--starting-file’ is not necessary.) See also Asking for Confirmation During Operations, and Excluding Some Files.

Same Order

‘--same-order’
‘--preserve-order’
‘-s’: To process large lists of file names on machines with small amounts of memory. Use in conjunction with ‘--compare’ (‘--diff’, ‘-d’), ‘--list’ (‘-t’) or ‘--extract’ (‘--get’, ‘-x’).

The ‘--same-order’ (‘--preserve-order’, ‘-s’) option tells tar that the list of file names to be listed or extracted is sorted in the same order as the files in the archive. This allows a large list of names to be used, even on a small machine that would not otherwise be able to hold all the names in memory at the same time. Such a sorted list can easily be created by running ‘tar -t’ on the archive and editing its output.

This option is probably never needed on modern computer systems.

This document was generated on August 23, 2023 using texi2html 5.0.

4.4.1 Options to Help Read Archives
4.4.2 Changing How `tar` Writes Files
4.4.3 Coping with Scarce Resources

4.4 Options Used by ‘--extract’

4.4.1 Options to Help Read Archives

Reading Full Records

Ignoring Blocks of Zeros

4.4.2 Changing How tar Writes Files

Options Controlling the Overwriting of Existing Files

Overwrite Old Files

Keep Old Files

Keep Newer Files

Unlink First

Recursive Unlink

Setting Data Modification Times

Setting Access Permissions

Directory Modification Times and Permissions

Writing to Standard Output

Writing to an External Program

Removing Files

4.4.3 Coping with Scarce Resources

Starting File

Same Order

4.4 Options Used by ‘`--extract`’

4.4.2 Changing How `tar` Writes Files