Next: , Previous: , Up: Top   [Contents][Index]

Appendix C SPSS/PC+ System File Format

SPSS/PC+, first released in 1984, was a simplified version of SPSS for IBM PC and compatible computers. It used a data file format related to the one described in the previous chapter, but simplified and incompatible. The SPSS/PC+ software became obsolete in the 1990s, so files in this format are rarely encountered today. Nevertheless, for completeness, and because it is not very difficult, it seems worthwhile to support at least reading these files. This chapter documents this format, based on examination of a corpus of about 60 files from a variety of sources.

System files use four data types: 8-bit characters, 16-bit unsigned integers, 32-bit unsigned integers, and 64-bit floating points, called here char, uint16, uint32, and flt64, respectively. Data is not necessarily aligned on a word or double-word boundary.

SPSS/PC+ ran only on IBM PC and compatible computers. Therefore, values in these files are always in little-endian byte order. Floating-point numbers are always in IEEE 754 format.

SPSS/PC+ system files represent the system-missing value as -1.66e308, or f5 1e 26 02 8a 8c ed ff expressed as hexadecimal. (This is an unusual choice: it is close to, but not equal to, the largest negative 64-bit IEEE 754, which is about -1.8e308.)

Text in SPSS/PC+ system file is encoded in ASCII-based 8-bit MS DOS codepages. The corpus used for investigating the format were all ASCII-only.

An SPSS/PC+ system file begins with the following 256-byte directory:

uint32              two;
uint32              zero;
struct {
    uint32          ofs;
    uint32          len;
} records[15];
char                filename[128];
uint32 two;
uint32 zero;

Always set to 2 and 0, respectively.

These fields could be used as a signature for the file format, but the product field in record 0 seems more likely to be unique (see Record 0 Main Header Record).

struct { … } records[15];

Each of the elements in this array identifies a record in the system file. The ofs is a byte offset, from the beginning of the file, that identifies the start of the record. len specifies the length of the record, in bytes. Many records are optional or not used. If a record is not present, ofs and len for that record are both are zero.

char filename[128];

In most files in the corpus, this field is entirely filled with spaces. In one file, it contains a file name, followed by a null bytes, followed by spaces to fill the remainder of the field. The meaning is unknown.

The following sections describe the contents of each record, identified by the index into the records array.

Next: , Previous: , Up: Top   [Contents][Index]