[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5 Performing Backups and Restoring Files

GNU tar is distributed along with the scripts for performing backups and restores. Even if there is a good chance those scripts may be satisfying to you, they are not the only scripts or methods available for doing backups and restore. You may well create your own, or use more sophisticated packages dedicated to that purpose.

Some users are enthusiastic about Amanda (The Advanced Maryland Automatic Network Disk Archiver), a backup system developed by James da Silva ‘jds@cs.umd.edu’ and available on many Unix systems. This is free software, and it is available from http://www.amanda.org.

This chapter documents both the provided shell scripts and tar options which are more specific to usage as a backup tool.

To back up a file system means to create archives that contain all the files in that file system. Those archives can then be used to restore any or all of those files (for instance if a disk crashes or a file is accidentally deleted). File system backups are also called dumps.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.1 Using tar to Perform Full Dumps

(This message will disappear, once this node revised.)

Full dumps should only be made when no other people or programs are modifying files in the file system. If files are modified while tar is making the backup, they may not be stored properly in the archive, in which case you won’t be able to restore them if you have to. (Files not being modified are written with no trouble, and do not corrupt the entire archive.)

You will want to use the ‘--label=archive-label’ (‘-V archive-label’) option to give the archive a volume label, so you can tell what this archive is even if the label falls off the tape, or anything like that.

Unless the file system you are dumping is guaranteed to fit on one volume, you will need to use the ‘--multi-volume’ (‘-M’) option. Make sure you have enough tapes on hand to complete the backup.

If you want to dump each file system separately you will need to use the ‘--one-file-system’ option to prevent tar from crossing file system boundaries when storing (sub)directories.

The ‘--incremental’ (‘-G’) (see section Using tar to Perform Incremental Dumps) option is not needed, since this is a complete copy of everything in the file system, and a full restore from this backup would only be done onto a completely empty disk.

Unless you are in a hurry, and trust the tar program (and your tapes), it is a good idea to use the ‘--verify’ (‘-W’) option, to make sure your files really made it onto the dump properly. This will also detect cases where the file was modified while (or just after) it was being archived. Not all media (notably cartridge tapes) are capable of being verified, unfortunately.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.2 Using tar to Perform Incremental Dumps

Incremental backup is a special form of GNU tar archive that stores additional metadata so that exact state of the file system can be restored when extracting the archive.

GNU tar currently offers two options for handling incremental backups: ‘--listed-incremental=snapshot-file’ (‘-g snapshot-file’) and ‘--incremental’ (‘-G’).

The option ‘--listed-incremental’ instructs tar to operate on an incremental archive with additional metadata stored in a standalone file, called a snapshot file. The purpose of this file is to help determine which files have been changed, added or deleted since the last backup, so that the next incremental backup will contain only modified files. The name of the snapshot file is given as an argument to the option:

--listed-incremental=file
-g file

Handle incremental backups with snapshot data in file.

To create an incremental backup, you would use ‘--listed-incremental’ together with ‘--create’ (see section How to Create Archives). For example:

$ tar --create \
           --file=archive.1.tar \
           --listed-incremental=/var/log/usr.snar \
           /usr

This will create in ‘archive.1.tar’ an incremental backup of the ‘/usr’ file system, storing additional metadata in the file ‘/var/log/usr.snar’. If this file does not exist, it will be created. The created archive will then be a level 0 backup; please see the next section for more on backup levels.

Otherwise, if the file ‘/var/log/usr.snar’ exists, it determines which files are modified. In this case only these files will be stored in the archive. Suppose, for example, that after running the above command, you delete file ‘/usr/doc/old’ and create directory ‘/usr/local/db’ with the following contents:

$ ls /usr/local/db
/usr/local/db/data
/usr/local/db/index

Some time later you create another incremental backup. You will then see:

$ tar --create \
           --file=archive.2.tar \
           --listed-incremental=/var/log/usr.snar \
           /usr
tar: usr/local/db: Directory is new
usr/local/db/
usr/local/db/data
usr/local/db/index

The created archive ‘archive.2.tar’ will contain only these three members. This archive is called a level 1 backup. Notice that ‘/var/log/usr.snar’ will be updated with the new data, so if you plan to create more ‘level 1’ backups, it is necessary to create a working copy of the snapshot file before running tar. The above example will then be modified as follows:

$ cp /var/log/usr.snar /var/log/usr.snar-1
$ tar --create \
           --file=archive.2.tar \
           --listed-incremental=/var/log/usr.snar-1 \
           /usr

You can force ‘level 0’ backups either by removing the snapshot file before running tar, or by supplying the ‘--level=0’ option, e.g.:

$ tar --create \
           --file=archive.2.tar \
           --listed-incremental=/var/log/usr.snar-0 \
           --level=0 \
           /usr

Incremental dumps depend crucially on time stamps, so the results are unreliable if you modify a file’s time stamps during dumping (e.g., with the ‘--atime-preserve=replace’ option), or if you set the clock backwards.

Metadata stored in snapshot files include device numbers, which, obviously are supposed to be non-volatile values. However, it turns out that NFS devices have undependable values when an automounter gets in the picture. This can lead to a great deal of spurious redumping in incremental dumps, so it is somewhat useless to compare two NFS devices numbers over time. The solution implemented currently is to consider all NFS devices as being equal when it comes to comparing directories; this is fairly gross, but there does not seem to be a better way to go.

Apart from using NFS, there are a number of cases where relying on device numbers can cause spurious redumping of unmodified files. For example, this occurs when archiving LVM snapshot volumes. To avoid this, use ‘--no-check-device’ option:

--no-check-device

Do not rely on device numbers when preparing a list of changed files for an incremental dump.

--check-device

Use device numbers when preparing a list of changed files for an incremental dump. This is the default behavior. The purpose of this option is to undo the effect of the ‘--no-check-device’ if it was given in TAR_OPTIONS environment variable (see TAR_OPTIONS).

There is also another way to cope with changing device numbers. It is described in detail in Fixing Snapshot Files.

Note that incremental archives use tar extensions and may not be readable by non-GNU versions of the tar program.

To extract from the incremental dumps, use ‘--listed-incremental’ together with ‘--extract’ option (see section Extracting Specific Files). In this case, tar does not need to access snapshot file, since all the data necessary for extraction are stored in the archive itself. So, when extracting, you can give whatever argument to ‘--listed-incremental’, the usual practice is to use ‘--listed-incremental=/dev/null’. Alternatively, you can use ‘--incremental’, which needs no arguments. In general, ‘--incremental’ (‘-G’) can be used as a shortcut for ‘--listed-incremental’ when listing or extracting incremental backups (for more information regarding this option, see incremental-op).

When extracting from the incremental backup GNU tar attempts to restore the exact state the file system had when the archive was created. In particular, it will delete those files in the file system that did not exist in their directories when the archive was created. If you have created several levels of incremental files, then in order to restore the exact contents the file system had when the last level was created, you will need to restore from all backups in turn. Continuing our example, to restore the state of ‘/usr’ file system, one would do(12):

$ tar --extract \
           --listed-incremental=/dev/null \
           --file archive.1.tar
$ tar --extract \
           --listed-incremental=/dev/null \
           --file archive.2.tar

To list the contents of an incremental archive, use ‘--list’ (see section How to List Archives), as usual. To obtain more information about the archive, use ‘--listed-incremental’ or ‘--incremental’ combined with two ‘--verbose’ options(13):

tar --list --incremental --verbose --verbose --file archive.tar

This command will print, for each directory in the archive, the list of files in that directory at the time the archive was created. This information is put out in a format which is both human-readable and unambiguous for a program: each file name is printed as

x file

where x is a letter describing the status of the file: ‘Y’ if the file is present in the archive, ‘N’ if the file is not included in the archive, or a ‘D’ if the file is a directory (and is included in the archive). See section Dumpdir, for the detailed description of dumpdirs and status codes. Each such line is terminated by a newline character. The last line is followed by an additional newline to indicate the end of the data.

The option ‘--incremental’ (‘-G’) gives the same behavior as ‘--listed-incremental’ when used with ‘--list’ and ‘--extract’ options. When used with ‘--create’ option, it creates an incremental archive without creating snapshot file. Thus, it is impossible to create several levels of incremental backups with ‘--incremental’ option.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.3 Levels of Backups

An archive containing all the files in the file system is called a full backup or full dump. You could insure your data by creating a full dump every day. This strategy, however, would waste a substantial amount of archive media and user time, as unchanged files are daily re-archived.

It is more efficient to do a full dump only occasionally. To back up files between full dumps, you can use incremental dumps. A level one dump archives all the files that have changed since the last full dump.

A typical dump strategy would be to perform a full dump once a week, and a level one dump once a day. This means some versions of files will in fact be archived more than once, but this dump strategy makes it possible to restore a file system to within one day of accuracy by only extracting two archives—the last weekly (full) dump and the last daily (level one) dump. The only information lost would be in files changed or created since the last daily backup. (Doing dumps more than once a day is usually not worth the trouble.)

GNU tar comes with scripts you can use to do full and level-one (actually, even level-two and so on) dumps. Using scripts (shell programs) to perform backups and restoration is a convenient and reliable alternative to typing out file name lists and tar commands by hand.

Before you use these scripts, you need to edit the file ‘backup-specs’, which specifies parameters used by the backup scripts and by the restore script. This file is usually located in ‘/etc/backup’ directory. See section Setting Parameters for Backups and Restoration, for its detailed description. Once the backup parameters are set, you can perform backups or restoration by running the appropriate script.

The name of the backup script is backup. The name of the restore script is restore. The following sections describe their use in detail.

Please Note: The backup and restoration scripts are designed to be used together. While it is possible to restore files by hand from an archive which was created using a backup script, and to create an archive by hand which could then be extracted using the restore script, it is easier to use the scripts. See section Using tar to Perform Incremental Dumps, before making such an attempt.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4 Setting Parameters for Backups and Restoration

The file ‘backup-specs’ specifies backup parameters for the backup and restoration scripts provided with tar. You must edit ‘backup-specs’ to fit your system configuration and schedule before using these scripts.

Syntactically, ‘backup-specs’ is a shell script, containing mainly variable assignments. However, any valid shell construct is allowed in this file. Particularly, you may wish to define functions within that script (e.g., see RESTORE_BEGIN below). For more information about shell script syntax, please refer to the definition of the Shell Command Language. See also Bash Features in Bash Reference Manual.

The shell variables controlling behavior of backup and restore are described in the following subsections.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.1 General-Purpose Variables

Backup variable: ADMINISTRATOR

The user name of the backup administrator. Backup scripts sends a backup report to this address.

Backup variable: BACKUP_HOUR

The hour at which the backups are done. This can be a number from 0 to 23, or the time specification in form hours:minutes, or the string ‘now’.

This variable is used by backup. Its value may be overridden using ‘--time’ option (see section Using the Backup Scripts).

Backup variable: TAPE_FILE

The device tar writes the archive to. If TAPE_FILE is a remote archive (see remote-dev), backup script will suppose that your mt is able to access remote devices. If RSH (see RSH) is set, ‘--rsh-command’ option will be added to invocations of mt.

Backup variable: BLOCKING

The blocking factor tar will use when writing the dump archive. See section The Blocking Factor of an Archive.

Backup variable: BACKUP_DIRS

A list of file systems to be dumped (for backup), or restored (for restore). You can include any directory name in the list — subdirectories on that file system will be included, regardless of how they may look to other networked machines. Subdirectories on other file systems will be ignored.

The host name specifies which host to run tar on, and should normally be the host that actually contains the file system. However, the host machine must have GNU tar installed, and must be able to access the directory containing the backup scripts and their support files using the same file name that is used on the machine where the scripts are run (i.e., what pwd will print when in that directory on that machine). If the host that contains the file system does not have this capability, you can specify another host as long as it can access the file system through NFS.

If the list of file systems is very long you may wish to put it in a separate file. This file is usually named ‘/etc/backup/dirs’, but this name may be overridden in ‘backup-specs’ using DIRLIST variable.

Backup variable: DIRLIST

The name of the file that contains a list of file systems to backup or restore. By default it is ‘/etc/backup/dirs’.

Backup variable: BACKUP_FILES

A list of individual files to be dumped (for backup), or restored (for restore). These should be accessible from the machine on which the backup script is run.

If the list of individual files is very long you may wish to store it in a separate file. This file is usually named ‘/etc/backup/files’, but this name may be overridden in ‘backup-specs’ using FILELIST variable.

Backup variable: FILELIST

The name of the file that contains a list of individual files to backup or restore. By default it is ‘/etc/backup/files’.

Backup variable: MT

Full file name of mt binary.

Backup variable: RSH

Full file name of rsh binary or its equivalent. You may wish to set it to ssh, to improve security. In this case you will have to use public key authentication.

Backup variable: RSH_COMMAND

Full file name of rsh binary on remote machines. This will be passed via ‘--rsh-command’ option to the remote invocation of GNU tar.

Backup variable: VOLNO_FILE

Name of temporary file to hold volume numbers. This needs to be accessible by all the machines which have file systems to be dumped.

Backup variable: XLIST

Name of exclude file list. An exclude file list is a file located on the remote machine and containing the list of files to be excluded from the backup. Exclude file lists are searched in /etc/tar-backup directory. A common use for exclude file lists is to exclude files containing security-sensitive information (e.g., ‘/etc/shadow’ from backups).

This variable affects only backup.

Backup variable: SLEEP_TIME

Time to sleep between dumps of any two successive file systems

This variable affects only backup.

Backup variable: DUMP_REMIND_SCRIPT

Script to be run when it’s time to insert a new tape in for the next volume. Administrators may want to tailor this script for their site. If this variable isn’t set, GNU tar will display its built-in prompt, and will expect confirmation from the console. For the description of the default prompt, see change volume prompt.

Backup variable: SLEEP_MESSAGE

Message to display on the terminal while waiting for dump time. Usually this will just be some literal text.

Backup variable: TAR

Full file name of the GNU tar executable. If this is not set, backup scripts will search tar in the current shell path.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.2 Magnetic Tape Control

Backup scripts access tape device using special hook functions. These functions take a single argument — the name of the tape device. Their names are kept in the following variables:

Backup variable: MT_BEGIN

The name of begin function. This function is called before accessing the drive. By default it retensions the tape:

MT_BEGIN=mt_begin

mt_begin() {
    mt -f "$1" retension
}
Backup variable: MT_REWIND

The name of rewind function. The default definition is as follows:

MT_REWIND=mt_rewind

mt_rewind() {
    mt -f "$1" rewind
}
Backup variable: MT_OFFLINE

The name of the function switching the tape off line. By default it is defined as follows:

MT_OFFLINE=mt_offline

mt_offline() {
    mt -f "$1" offl
}
Backup variable: MT_STATUS

The name of the function used to obtain the status of the archive device, including error count. Default definition:

MT_STATUS=mt_status

mt_status() {
    mt -f "$1" status
}

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.3 User Hooks

User hooks are shell functions executed before and after each tar invocation. Thus, there are backup hooks, which are executed before and after dumping each file system, and restore hooks, executed before and after restoring a file system. Each user hook is a shell function taking four arguments:

User Hook Function: hook level host fs fsname

Its arguments are:

level

Current backup or restore level.

host

Name or IP address of the host machine being dumped or restored.

fs

Full file name of the file system being dumped or restored.

fsname

File system name with directory separators replaced with colons. This is useful, e.g., for creating unique files.

Following variables keep the names of user hook functions:

Backup variable: DUMP_BEGIN

Dump begin function. It is executed before dumping the file system.

Backup variable: DUMP_END

Executed after dumping the file system.

Backup variable: RESTORE_BEGIN

Executed before restoring the file system.

Backup variable: RESTORE_END

Executed after restoring the file system.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.4 An Example Text of ‘Backup-specs

The following is an example of ‘backup-specs’:

# site-specific parameters for file system backup.

ADMINISTRATOR=friedman
BACKUP_HOUR=1
TAPE_FILE=/dev/nrsmt0

# Use ssh instead of the less secure rsh
RSH=/usr/bin/ssh
RSH_COMMAND=/usr/bin/ssh

# Override MT_STATUS function:
my_status() {
      mts -t $TAPE_FILE
}
MT_STATUS=my_status

# Disable MT_OFFLINE function
MT_OFFLINE=:

BLOCKING=124
BACKUP_DIRS="
        albert:/fs/fsf
        apple-gunkies:/gd
        albert:/fs/gd2
        albert:/fs/gp
        geech:/usr/jla
        churchy:/usr/roland
        albert:/
        albert:/usr
        apple-gunkies:/
        apple-gunkies:/usr
        gnu:/hack
        gnu:/u
        apple-gunkies:/com/mailer/gnu
        apple-gunkies:/com/archive/gnu"

BACKUP_FILES="/com/mailer/aliases /com/mailer/league*[a-z]"


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.5 Using the Backup Scripts

The syntax for running a backup script is:

backup --level=level --time=time

The ‘--level’ option requests the dump level. Thus, to produce a full dump, specify --level=0 (this is the default, so ‘--level’ may be omitted if its value is 0)(14).

The ‘--time’ option determines when should the backup be run. Time may take three forms:

hh:mm

The dump must be run at hh hours mm minutes.

hh

The dump must be run at hh hours.

now

The dump must be run immediately.

You should start a script with a tape or disk mounted. Once you start a script, it prompts you for new tapes or disks as it needs them. Media volumes don’t have to correspond to archive files — a multi-volume archive can be started in the middle of a tape that already contains the end of another multi-volume archive. The restore script prompts for media by its archive volume, so to avoid an error message you should keep track of which tape (or disk) contains which volume of the archive (see section Using the Restore Script).

The backup scripts write two files on the file system. The first is a record file in ‘/etc/tar-backup/’, which is used by the scripts to store and retrieve information about which files were dumped. This file is not meant to be read by humans, and should not be deleted by them. See section Format of the Incremental Snapshot Files, for a more detailed explanation of this file.

The second file is a log file containing the names of the file systems and files dumped, what time the backup was made, and any error messages that were generated, as well as how much space was left in the media volume after the last volume of the archive was written. You should check this log file after every backup. The file name is ‘log-mm-dd-yyyy-level-n’, where mm-dd-yyyy represents current date, and n represents current dump level number.

The script also prints the name of each system being dumped to the standard output.

Following is the full list of options accepted by backup script:

-l level
--level=level

Do backup level level (default 0).

-f
--force

Force backup even if today’s log file already exists.

-v[level]
--verbose[=level]

Set verbosity level. The higher the level is, the more debugging information will be output during execution. Default level is 100, which means the highest debugging level.

-t start-time
--time=start-time

Wait till time, then do backup.

-h
--help

Display short help message and exit.

-V
--version

Display information about the program’s name, version, origin and legal status, all on standard output, and then exit successfully.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.6 Using the Restore Script

To restore files that were archived using a scripted backup, use the restore script. Its usage is quite straightforward. In the simplest form, invoke restore --all, it will then restore all the file systems and files specified in ‘backup-specs’ (see section BACKUP_DIRS).

You may select the file systems (and/or files) to restore by giving restore a list of patterns in its command line. For example, running

restore 'albert:*'

will restore all file systems on the machine ‘albert’. A more complicated example:

restore 'albert:*' '*:/var'

This command will restore all file systems on the machine ‘albert’ as well as ‘/var’ file system on all machines.

By default restore will start restoring files from the lowest available dump level (usually zero) and will continue through all available dump levels. There may be situations where such a thorough restore is not necessary. For example, you may wish to restore only files from the recent level one backup. To do so, use ‘--level’ option, as shown in the example below:

restore --level=1

The full list of options accepted by restore follows:

-a
--all

Restore all file systems and files specified in ‘backup-specs’.

-l level
--level=level

Start restoring from the given backup level, instead of the default 0.

-v[level]
--verbose[=level]

Set verbosity level. The higher the level is, the more debugging information will be output during execution. Default level is 100, which means the highest debugging level.

-h
--help

Display short help message and exit.

-V
--version

Display information about the program’s name, version, origin and legal status, all on standard output, and then exit successfully.

You should start the restore script with the media containing the first volume of the archive mounted. The script will prompt for other volumes as they are needed. If the archive is on tape, you don’t need to rewind the tape to to its beginning—if the tape head is positioned past the beginning of the archive, the script will rewind the tape as needed. See section Tape Positions and Tape Marks, for a discussion of tape positioning.

Warning: The script will delete files from the active file system if they were not in the file system when the archive was made.

See section Using tar to Perform Incremental Dumps, for an explanation of how the script makes that determination.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated on August 23, 2023 using texi2html 5.0.