[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10 Reliability and Security

The tar command reads and writes files as any other application does, and is subject to the usual caveats about reliability and security. This section contains some commonsense advice on the topic.

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.1 Reliability

Ideally, when tar is creating an archive, it reads from a file system that is not being modified, and encounters no errors or inconsistencies while reading and writing. If this is the case, the archive should faithfully reflect what was read. Similarly, when extracting from an archive, ideally tar ideally encounters no errors and the extracted files faithfully reflect what was in the archive.

However, when reading or writing real-world file systems, several things can go wrong; these include permissions problems, corruption of data, and race conditions.

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.1.1 Permissions Problems

If tar encounters errors while reading or writing files, it normally reports an error and exits with nonzero status. The work it does may therefore be incomplete. For example, when creating an archive, if tar cannot read a file then it cannot copy the file into the archive.

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.1.2 Data Corruption and Repair

If an archive becomes corrupted by an I/O error, this may corrupt the data in an extracted file. Worse, it may corrupt the file’s metadata, which may cause later parts of the archive to become misinterpreted. An tar-format archive contains a checksum that most likely will detect errors in the metadata, but it will not detect errors in the data.

If data corruption is a concern, you can compute and check your own checksums of an archive by using other programs, such as cksum.

When attempting to recover from a read error or data corruption in an archive, you may need to skip past the questionable data and read the rest of the archive. This requires some expertise in the archive format and in other software tools.

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.1.3 Race conditions

If some other process is modifying the file system while tar is reading or writing files, the result may well be inconsistent due to race conditions. For example, if another process creates some files in a directory while tar is creating an archive containing the directory’s files, tar may see some of the files but not others, or it may see a file that is in the process of being created. The resulting archive may not be a snapshot of the file system at any point in time. If an application such as a database system depends on an accurate snapshot, restoring from the tar archive of a live file system may therefore break that consistency and may break the application. The simplest way to avoid the consistency issues is to avoid making other changes to the file system while tar is reading it or writing it.

When creating an archive, several options are available to avoid race conditions. Some hosts have a way of snapshotting a file system, or of temporarily suspending all changes to a file system, by (say) suspending the only virtual machine that can modify a file system; if you use these facilities and have tar -c read from a snapshot when creating an archive, you can avoid inconsistency problems. More drastically, before starting tar you could suspend or shut down all processes other than tar that have access to the file system, or you could unmount the file system and then mount it read-only.

When extracting from an archive, one approach to avoid race conditions is to create a directory that no other process can write to, and extract into that.

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2 Security

In some cases tar may be used in an adversarial situation, where an untrusted user is attempting to gain information about or modify otherwise-inaccessible files. Dealing with untrusted data (that is, data generated by an untrusted user) typically requires extra care, because even the smallest mistake in the use of tar is more likely to be exploited by an adversary than by a race condition.

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.1 Privacy

Standard privacy concerns apply when using tar. For example, suppose you are archiving your home directory into a file ‘/archive/myhome.tar’. Any secret information in your home directory, such as your SSH secret keys, are copied faithfully into the archive. Therefore, if your home directory contains any file that should not be read by some other user, the archive itself should be not be readable by that user. And even if the archive’s data are inaccessible to untrusted users, its metadata (such as size or last-modified date) may reveal some information about your home directory; if the metadata are intended to be private, the archive’s parent directory should also be inaccessible to untrusted users.

One precaution is to create ‘/archive’ so that it is not accessible to any user, unless that user also has permission to access all the files in your home directory.

Similarly, when extracting from an archive, take care that the permissions of the extracted files are not more generous than what you want. Even if the archive itself is readable only to you, files extracted from it have their own permissions that may differ.

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.2 Integrity

When creating archives, take care that they are not writable by a untrusted user; otherwise, that user could modify the archive, and when you later extract from the archive you will get incorrect data.

When tar extracts from an archive, by default it writes into files relative to the working directory. If the archive was generated by an untrusted user, that user therefore can write into any file under the working directory. If the working directory contains a symbolic link to another directory, the untrusted user can also write into any file under the referenced directory. When extracting from an untrusted archive, it is therefore good practice to create an empty directory and run tar in that directory.

When extracting from two or more untrusted archives, each one should be extracted independently, into different empty directories. Otherwise, the first archive could create a symbolic link into an area outside the working directory, and the second one could follow the link and overwrite data that is not under the working directory. For example, when restoring from a series of incremental dumps, the archives should have been created by a trusted process, as otherwise the incremental restores might alter data outside the working directory.

If you use the ‘--absolute-names’ (‘-P’) option when extracting, tar respects any file names in the archive, even file names that begin with ‘/’ or contain ‘..’. As this lets the archive overwrite any file in your system that you can write, the ‘--absolute-names’ (‘-P’) option should be used only for trusted archives.

Conversely, with the ‘--keep-old-files’ (‘-k’) and ‘--skip-old-files’ options, tar refuses to replace existing files when extracting. The difference between the two options is that the former treats existing files as errors whereas the latter just silently ignores them.

Finally, with the ‘--no-overwrite-dir’ option, tar refuses to replace the permissions or ownership of already-existing directories. These options may help when extracting from untrusted archives.

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.3 Dealing with Live Untrusted Data

Extra care is required when creating from or extracting into a file system that is accessible to untrusted users. For example, superusers who invoke tar must be wary about its actions being hijacked by an adversary who is reading or writing the file system at the same time that tar is operating.

When creating an archive from a live file system, tar is vulnerable to denial-of-service attacks. For example, an adversarial user could create the illusion of an indefinitely-deep directory hierarchy ‘d/e/f/g/...’ by creating directories one step ahead of tar, or the illusion of an indefinitely-long file by creating a sparse file but arranging for blocks to be allocated just before tar reads them. There is no easy way for tar to distinguish these scenarios from legitimate uses, so you may need to monitor tar, just as you’d need to monitor any other system service, to detect such attacks.

While a superuser is extracting from an archive into a live file system, an untrusted user might replace a directory with a symbolic link, in hopes that tar will follow the symbolic link and extract data into files that the untrusted user does not have access to. Even if the archive was generated by the superuser, it may contain a file such as ‘d/etc/passwd’ that the untrusted user earlier created in order to break in; if the untrusted user replaces the directory ‘d/etc’ with a symbolic link to ‘/etc’ while tar is running, tar will overwrite ‘/etc/passwd’. This attack can be prevented by extracting into a directory that is inaccessible to untrusted users.

Similar attacks via symbolic links are also possible when creating an archive, if the untrusted user can modify an ancestor of a top-level argument of tar. For example, an untrusted user that can modify ‘/home/eve’ can hijack a running instance of ‘tar -cf - /home/eve/Documents/yesterday’ by replacing ‘/home/eve/Documents’ with a symbolic link to some other location. Attacks like these can be prevented by making sure that untrusted users cannot modify any files that are top-level arguments to tar, or any ancestor directories of these files.

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.4 Security Rules of Thumb

This section briefly summarizes rules of thumb for avoiding security pitfalls.

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated on August 23, 2023 using texi2html 5.0.