[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ] Extracting Sparse Members

Any tar implementation will be able to extract sparse members from a PAX archive. However, the extracted files will be condensed, i.e., any zero blocks will be removed from them. When we restore such a condensed file to its original form, by adding zero blocks (or holes) back to their original locations, we call this process expanding a compressed sparse file.

To expand a file, you will need a simple auxiliary program called xsparse. It is available in source form from GNU tar home page.

Let’s begin with archive members in sparse format version 1.0(24), which are the easiest to expand. The condensed file will contain both file map and file data, so no additional data will be needed to restore it. If the original file name was ‘dir/name’, then the condensed file will be named ‘dir/GNUSparseFile.n/name’, where n is a decimal number(25).

To expand a version 1.0 file, run xsparse as follows:

$ xsparse ‘cond-file

where ‘cond-file’ is the name of the condensed file. The utility will deduce the name for the resulting expanded file using the following algorithm:

  1. If ‘cond-file’ does not contain any directories, ‘../cond-file’ will be used;
  2. If ‘cond-file’ has the form ‘dir/t/name’, where both t and name are simple names, with no ‘/’ characters in them, the output file name will be ‘dir/name’.
  3. Otherwise, if ‘cond-file’ has the form ‘dir/name’, the output file name will be ‘name’.

In the unlikely case when this algorithm does not suit your needs, you can explicitly specify output file name as a second argument to the command:

$ xsparse ‘cond-file’ ‘out-file

It is often a good idea to run xsparse in dry run mode first. In this mode, the command does not actually expand the file, but verbosely lists all actions it would be taking to do so. The dry run mode is enabled by ‘-n’ command line argument:

$ xsparse -n /home/gray/GNUSparseFile.6058/sparsefile
Reading v.1.0 sparse map
Expanding file '/home/gray/GNUSparseFile.6058/sparsefile' to
Finished dry run

To actually expand the file, you would run:

$ xsparse /home/gray/GNUSparseFile.6058/sparsefile

The program behaves the same way all UNIX utilities do: it will keep quiet unless it has something important to tell you (e.g. an error condition or something). If you wish it to produce verbose output, similar to that from the dry run mode, use ‘-v’ option:

$ xsparse -v /home/gray/GNUSparseFile.6058/sparsefile
Reading v.1.0 sparse map
Expanding file '/home/gray/GNUSparseFile.6058/sparsefile' to

Additionally, if your tar implementation has extracted the extended headers for this file, you can instruct xstar to use them in order to verify the integrity of the expanded file. The option ‘-x’ sets the name of the extended header file to use. Continuing our example:

$ xsparse -v -x /home/gray/PaxHeaders/sparsefile \
Reading extended header file
Found variable GNU.sparse.major = 1
Found variable GNU.sparse.minor = 0
Found variable GNU.sparse.name = sparsefile
Found variable GNU.sparse.realsize = 217481216
Reading v.1.0 sparse map
Expanding file '/home/gray/GNUSparseFile.6058/sparsefile' to

An extended header is a special tar archive header that precedes an archive member and contains a set of variables, describing the member properties that cannot be stored in the standard ustar header. While optional for expanding sparse version 1.0 members, the use of extended headers is mandatory when expanding sparse members in older sparse formats: v.0.0 and v.0.1 (The sparse formats are described in detail in Storing Sparse Files.) So, for these formats, the question is: how to obtain extended headers from the archive?

If you use a tar implementation that does not support PAX format, extended headers for each member will be extracted as a separate file. If we represent the member name as ‘dir/name’, then the extended header file will be named ‘dir/PaxHeaders/name’.

Things become more difficult if your tar implementation does support PAX headers, because in this case you will have to manually extract the headers. We recommend the following algorithm:

  1. Consult the documentation of your tar implementation for an option that prints block numbers along with the archive listing (analogous to GNU tar’s ‘-R’ option). For example, star has ‘-block-number’.
  2. Obtain verbose listing using the ‘block number’ option, and find block numbers of the sparse member in question and the member immediately following it. For example, running star on our archive we obtain:
    $ star -t -v -block-number -f arc.tar
    star: Unknown extended header keyword 'GNU.sparse.size' ignored.
    star: Unknown extended header keyword 'GNU.sparse.numblocks' ignored.
    star: Unknown extended header keyword 'GNU.sparse.name' ignored.
    star: Unknown extended header keyword 'GNU.sparse.map' ignored.
    block        56:  425984 -rw-r--r--  gray/users Jun 25 14:46 2006 GNUSparseFile.28124/sparsefile
    block       897:   65391 -rw-r--r--  gray/users Jun 24 20:06 2006 README

    (as usual, ignore the warnings about unknown keywords.)

  3. Let size be the size of the sparse member, Bs be its block number and Bn be the block number of the next member. Compute:
    N = Bs - Bn - size/512 - 2

    This number gives the size of the extended header part in tar blocks. In our example, this formula gives: 897 - 56 - 425984 / 512 - 2 = 7.

  4. Use dd to extract the headers:
    dd if=archive of=hname bs=512 skip=Bs count=N

    where archive is the archive name, hname is a name of the file to store the extended header in, Bs and N are computed in previous steps.

    In our example, this command will be

    $ dd if=arc.tar of=xhdr bs=512 skip=56 count=7

Finally, you can expand the condensed file, using the obtained header:

$ xsparse -v -x xhdr GNUSparseFile.6058/sparsefile
Reading extended header file
Found variable GNU.sparse.size = 217481216
Found variable GNU.sparse.numblocks = 208
Found variable GNU.sparse.name = sparsefile
Found variable GNU.sparse.map = 0,2048,1050624,2048,…
Expanding file 'GNUSparseFile.28124/sparsefile' to 'sparsefile'

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated on August 23, 2023 using texi2html 5.0.