[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

E.0.2 PAX Format, Versions 0.0 and 0.1

There are two formats available in this branch. The version 0.0 is the initial version of sparse format used by tar versions 1.14–1.15.1. The sparse file map is kept in extended (x) PAX header variables:

GNU.sparse.size

Real size of the stored file;

GNU.sparse.numblocks

Number of blocks in the sparse map;

GNU.sparse.offset

Offset of the data block;

GNU.sparse.numbytes

Size of the data block.

The latter two variables repeat for each data block, so the overall structure is like this:

GNU.sparse.size=size
GNU.sparse.numblocks=numblocks
repeat numblocks times
  GNU.sparse.offset=offset
  GNU.sparse.numbytes=numbytes
end repeat

This format presented the following two problems:

  1. Whereas the POSIX specification allows a variable to appear multiple times in a header, it requires that only the last occurrence be meaningful. Thus, multiple occurrences of GNU.sparse.offset and GNU.sparse.numbytes are conflicting with the POSIX specs.
  2. Attempting to extract such archives using a third-party’s tar results in extraction of sparse files in condensed form. If the tar implementation in question does not support POSIX format, it will also extract a file containing extension header attributes. This file can be used to expand the file to its original state. However, posix-aware tars will usually ignore the unknown variables, which makes restoring the file more difficult. See Extraction of sparse members in v.0.0 format, for the detailed description of how to restore such members using non-GNU tars.

GNU tar 1.15.2 introduced sparse format version 0.1, which attempted to solve these problems. As its predecessor, this format stores sparse map in the extended POSIX header. It retains GNU.sparse.size and GNU.sparse.numblocks variables, but instead of GNU.sparse.offset/GNU.sparse.numbytes pairs it uses a single variable:

GNU.sparse.map

Map of non-null data chunks. It is a string consisting of comma-separated values "offset,size[,offset-1,size-1...]"

To address the 2nd problem, the name field in ustar is replaced with a special name, constructed using the following pattern:

%d/GNUSparseFile.%p/%f

The real name of the sparse file is stored in the variable GNU.sparse.name. Thus, those tar implementations that are not aware of GNU extensions will at least extract the files into separate directories, giving the user a possibility to expand it afterwards. See Extraction of sparse members in v.0.1 format, for the detailed description of how to restore such members using non-GNU tars.

The resulting GNU.sparse.map string can be very long. Although POSIX does not impose any limit on the length of a x header variable, this possibly can confuse some tars.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated on August 23, 2023 using texi2html 5.0.