1.21 Data Record

The data record must follow all other records in the system file. Every system file must have a data record that specifies data for at least one case. The format of the data record varies depending on the value of compression in the file header record:

0: no compression

Data is arranged as a series of 8-byte elements. Each element corresponds to the variable declared in the respective variable record (see Variable Record). Numeric values are given in flt64 format; string values are literal characters string, padded on the right when necessary to fill out 8-byte units.

1: bytecode compression

The first 8 bytes of the data record is divided into a series of 1-byte command codes. These codes have meanings as described below:

0

Ignored. If the program writing the system file accumulates compressed data in blocks of fixed length, 0 bytes can be used to pad out extra bytes remaining at the end of a fixed-size block.

1 through 251

A number with value code - bias, where code is the value of the compression code and bias is the variable bias from the file header. For example, code 105 with bias 100.0 (the normal value) indicates a numeric variable of value 5.

A code of 0 (after subtracting the bias) in a string field encodes null bytes. This is unusual, since a string field normally encodes text data, but it exists in real system files.

252

End of file. This code may or may not appear at the end of the data stream. PSPP always outputs this code but its use is not required.

253

A numeric or string value that is not compressible. The value is stored in the 8 bytes following the current block of command bytes. If this value appears twice in a block of command bytes, then it indicates the second group of 8 bytes following the command bytes, and so on.

254

An 8-byte string value that is all spaces.

255

The system-missing value.

The end of the 8-byte group of bytecodes is followed by any 8-byte blocks of non-compressible values indicated by code 253. After that follows another 8-byte group of bytecodes, then those bytecodes’ non-compressible values. The pattern repeats to the end of the file or a code with value 252.

2: ZLIB compression

The data record consists of the following, in order:

  • ZLIB data header, 24 bytes long.
  • One or more variable-length blocks of ZLIB compressed data.
  • ZLIB data trailer, with a 24-byte fixed header plus an additional 24 bytes for each preceding ZLIB compressed data block.

The ZLIB data header has the following format:

int64               zheader_ofs;
int64               ztrailer_ofs;
int64               ztrailer_len;
int64 zheader_ofs;

The offset, in bytes, of the beginning of this structure within the system file.

int64 ztrailer_ofs;

The offset, in bytes, of the first byte of the ZLIB data trailer.

int64 ztrailer_len;

The number of bytes in the ZLIB data trailer. This and the previous field sum to the size of the system file in bytes.

The data header is followed by (ztrailer_len - 24) / 24 ZLIB compressed data blocks. Each ZLIB compressed data block begins with a ZLIB header as specified in RFC 1950, e.g. hex bytes 78 01 (the only header yet observed in practice). Each block decompresses to a fixed number of bytes (in practice only 0x3ff000-byte blocks have been observed), except that the last block of data may be shorter. The last ZLIB compressed data block gends just before offset ztrailer_ofs.

The result of ZLIB decompression is bytecode compressed data as described above for compression format 1.

The ZLIB data trailer begins with the following 24-byte fixed header:

int64               bias;
int64               zero;
int32               block_size;
int32               n_blocks;
int64 int_bias;

The compression bias as a negative integer, e.g. if bias in the file header record is 100.0, then int_bias is −100 (this is the only value yet observed in practice).

int64 zero;

Always observed to be zero.

int32 block_size;

The number of bytes in each ZLIB compressed data block, except possibly the last, following decompression. Only 0x3ff000 has been observed so far.

int32 n_blocks;

The number of ZLIB compressed data blocks, always exactly (ztrailer_len - 24) / 24.

The fixed header is followed by n_blocks 24-byte ZLIB data block descriptors, each of which describes the compressed data block corresponding to its offset. Each block descriptor has the following format:

int64               uncompressed_ofs;
int64               compressed_ofs;
int32               uncompressed_size;
int32               compressed_size;
int64 uncompressed_ofs;

The offset, in bytes, that this block of data would have in a similar system file that uses compression format 1. This is zheader_ofs in the first block descriptor, and in each succeeding block descriptor it is the sum of the previous desciptor’s uncompressed_ofs and uncompressed_size.

int64 compressed_ofs;

The offset, in bytes, of the actual beginning of this compressed data block. This is zheader_ofs + 24 in the first block descriptor, and in each succeeding block descriptor it is the sum of the previous descriptor’s compressed_ofs and compressed_size. The final block descriptor’s compressed_ofs and compressed_size sum to ztrailer_ofs.

int32 uncompressed_size;

The number of bytes in this data block, after decompression. This is block_size in every data block except the last, which may be smaller.

int32 compressed_size;

The number of bytes in this data block, as stored compressed in this system file.