6.4 Record 3: Data Record

The format of the data record varies depending on the value of compressed in the file header record:

0: no compression

Data is arranged as a series of 8-byte elements, one per variable instance variable in the variable record (see Record 1: Variables Record). Numeric values are given in flt64 format; string values are literal characters string, padded on the right with spaces when necessary to fill out 8-byte units.

1: bytecode compression

The first 8 bytes of the data record is divided into a series of 1-byte command codes. These codes have meanings as described below:

0

The system-missing value.

1

A numeric or string value that is not compressible. The value is stored in the 8 bytes following the current block of command bytes. If this value appears twice in a block of command bytes, then it indicates the second group of 8 bytes following the command bytes, and so on.

2 through 255

A number with value code - 100, where code is the value of the compression code. For example, code 105 indicates a numeric variable of value 5.

The end of the 8-byte group of bytecodes is followed by any 8-byte blocks of non-compressible values indicated by code 1. After that follows another 8-byte group of bytecodes, then those bytecodes’ non-compressible values. The pattern repeats up to the number of cases specified by the main header record have been seen.

The corpus does not contain any files with command codes 2 through 95, so it is possible that some of these codes are used for special purposes.

Cases of data often, but not always, fill the entire data record. Readers should stop reading after the number of cases specified in the main header record. Otherwise, readers may try to interpret garbage following the data as additional cases.