1.3 Variable Record

There must be one variable record for each numeric variable and each string variable with width 8 bytes or less. String variables wider than 8 bytes have one variable record for each 8 bytes, rounding up. The first variable record for a long string specifies the variable’s correct dictionary information. Subsequent variable records for a long string are filled with dummy information: a type of -1, no variable label or missing values, print and write formats that are ignored, and an empty string as name. A few system files have been encountered that include a variable label on dummy variable records, so readers should take care to parse dummy variable records in the same way as other variable records.

The dictionary index of a variable is a 1-based offset in the set of variable records, including dummy variable records for long string variables. The first variable record has a dictionary index of 1, the second has a dictionary index of 2, and so on.

The system file format does not directly support string variables wider than 255 bytes. Such very long string variables are represented by a number of narrower string variables. See Very Long String Record, for details.

A system file should contain at least one variable and thus at least one variable record, but system files have been observed in the wild without any variables (thus, no data either).

int32               rec_type;
int32               type;
int32               has_var_label;
int32               n_missing_values;
int32               print;
int32               write;
char                name[8];

/* Present only if has_var_label is 1. */
int32               label_len;
char                label[];

/* Present only if n_missing_values is nonzero. */
flt64               missing_values[];
int32 rec_type;

Record type code. Always set to 2.

int32 type;

Variable type code. Set to 0 for a numeric variable. For a short string variable or the first part of a long string variable, this is set to the width of the string. For the second and subsequent parts of a long string variable, set to -1, and the remaining fields in the structure are ignored.

int32 has_var_label;

If this variable has a variable label, set to 1; otherwise, set to 0.

int32 n_missing_values;

If the variable has no missing values, set to 0. If the variable has one, two, or three discrete missing values, set to 1, 2, or 3, respectively. If the variable has a range for missing variables, set to -2; if the variable has a range for missing variables plus a single discrete value, set to -3.

A long string variable always has the value 0 here. A separate record indicates missing values for long string variables (see Long String Missing Values Record).

int32 print;

Print format for this variable. See below.

int32 write;

Write format for this variable. See below.

char name[8];

Variable name. The variable name must begin with a capital letter or the at-sign (‘@’). Subsequent characters may also be digits, octothorpes (‘#’), dollar signs (‘$’), underscores (‘_’), or full stops (‘.’). The variable name is padded on the right with spaces.

The ‘name’ fields should be unique within a system file. System files written by SPSS that contain very long string variables with similar names sometimes contain duplicate names that are later eliminated by resolving the very long string names (see Very Long String Record). PSPP handles duplicates by assigning them new, unique names.

int32 label_len;

This field is present only if has_var_label is set to 1. It is set to the length, in characters, of the variable label. The documented maximum length varies from 120 to 255 based on SPSS version, but some files have been seen with longer labels. PSPP accepts labels of any length.

char label[];

This field is present only if has_var_label is set to 1. It has length label_len, rounded up to the nearest multiple of 32 bits. The first label_len characters are the variable’s variable label.

flt64 missing_values[];

This field is present only if n_missing_values is nonzero. It has the same number of 8-byte elements as the absolute value of n_missing_values. Each element is interpreted as a number for numeric variables (with HIGHEST and LOWEST indicated as described in the chapter introduction). For string variables of width less than 8 bytes, elements are right-padded with spaces; for string variables wider than 8 bytes, only the first 8 bytes of each missing value are specified, with the remainder implicitly all spaces.

For discrete missing values, each element represents one missing value. When a range is present, the first element denotes the minimum value in the range, and the second element denotes the maximum value in the range. When a range plus a value are present, the third element denotes the additional discrete missing value.

The print and write members of sysfile_variable are output formats coded into int32 types. The least-significant byte of the int32 represents the number of decimal places, and the next two bytes in order of increasing significance represent field width and format type, respectively. The most-significant byte is not used and should be set to zero.

Format types are defined as follows:

ValueMeaning
0Not used.
1A
2AHEX
3COMMA
4DOLLAR
5F
6IB
7PIBHEX
8P
9PIB
10PK
11RB
12RBHEX
13Not used.
14Not used.
15Z
16N
17E
18Not used.
19Not used.
20DATE
21TIME
22DATETIME
23ADATE
24JDATE
25DTIME
26WKDAY
27MONTH
28MOYR
29QYR
30WKYR
31PCT
32DOT
33CCA
34CCB
35CCC
36CCD
37CCE
38EDATE
39SDATE
40MTIME
41YMDHMS

A few system files have been observed in the wild with invalid write fields, in particular with value 0. Readers should probably treat invalid print or write fields as some default format.