Next: , Previous: , Up: System File Format   [Contents][Index]

B.2 File Header Record

A system file begins with the file header, with the following format:

char                rec_type[4];
char                prod_name[60];
int32               layout_code;
int32               nominal_case_size;
int32               compression;
int32               weight_index;
int32               ncases;
flt64               bias;
char                creation_date[9];
char                creation_time[8];
char                file_label[64];
char                padding[3];
char rec_type[4];

Record type code, either ‘$FL2’ for system files with uncompressed data or data compressed with simple bytecode compression, or ‘$FL3’ for system files with ZLIB compressed data.

This is truly a character field that uses the character encoding as other strings. Thus, in a file with an ASCII-based character encoding this field contains 24 46 4c 32 or 24 46 4c 33, and in a file with an EBCDIC-based encoding this field contains 5b c6 d3 f2. (No EBCDIC-based ZLIB-compressed files have been observed.)

char prod_name[60];

Product identification string. This always begins with the characters ‘@(#) SPSS DATA FILE’. PSPP uses the remaining characters to give its version and the operating system name; for example, ‘GNU pspp 0.1.4 - sparc-sun-solaris2.5.2’. The string is truncated if it would be longer than 60 characters; otherwise it is padded on the right with spaces.

The product name field allow readers to behave differently based on quirks in the way that particular software writes system files. See Value Labels Records, for the detail of the quirk that the PSPP system file reader tolerates in files written by ReadStat, which has in prod_name.

int32 layout_code;

Normally set to 2, although a few system files have been spotted in the wild with a value of 3 here. PSPP use this value to determine the file’s integer endianness (see System File Format).

int32 nominal_case_size;

Number of data elements per case. This is the number of variables, except that long string variables add extra data elements (one for every 8 characters after the first 8). However, string variables do not contribute to this value beyond the first 255 bytes. Further, some software always writes -1 or 0 in this field. In general, it is unsafe for systems reading system files to rely upon this value.

int32 compression;

Set to 0 if the data in the file is not compressed, 1 if the data is compressed with simple bytecode compression, 2 if the data is ZLIB compressed. This field has value 2 if and only if rec_type is ‘$FL3’.

int32 weight_index;

If one of the variables in the data set is used as a weighting variable, set to the dictionary index of that variable, plus 1 (see Dictionary Index). Otherwise, set to 0.

int32 ncases;

Set to the number of cases in the file if it is known, or -1 otherwise.

In the general case it is not possible to determine the number of cases that will be output to a system file at the time that the header is written. The way that this is dealt with is by writing the entire system file, including the header, then seeking back to the beginning of the file and writing just the ncases field. For files in which this is not valid, the seek operation fails. In this case, ncases remains -1.

flt64 bias;

Compression bias, ordinarily set to 100. Only integers between 1 - bias and 251 - bias can be compressed.

By assuming that its value is 100, PSPP uses bias to determine the file’s floating-point format and endianness (see System File Format). If the compression bias is not 100, PSPP cannot auto-detect the floating-point format and assumes that it is IEEE 754 format with the same endianness as the system file’s integers, which is correct for all known system files.

char creation_date[9];

Date of creation of the system file, in ‘dd mmm yy’ format, with the month as standard English abbreviations, using an initial capital letter and following with lowercase. If the date is not available then this field is arbitrarily set to ‘01 Jan 70’.

char creation_time[8];

Time of creation of the system file, in ‘hh:mm:ss’ format and using 24-hour time. If the time is not available then this field is arbitrarily set to ‘00:00:00’.

char file_label[64];

File label declared by the user, if any (see FILE LABEL in PSPP Users Guide). Padded on the right with spaces.

A product that identifies itself as VOXCO INTERVIEWER 4.3 uses CR-only line ends in this field, rather than the more usual LF-only or CR LF line ends.

char padding[3];

Ignored padding bytes to make the structure a multiple of 32 bits in length. Set to zeros.

Next: , Previous: , Up: System File Format   [Contents][Index]