Table of Contents ***************** GNU PSPP Developers Guide 1 Introduction 2 Basic Concepts 2.1 Values 2.1.1 Numeric Values 2.1.2 String Values 2.1.3 Runtime Typed Values 2.2 Input and Output Formats 2.2.1 Constructing and Verifying Formats 2.2.2 Format Utility Functions 2.2.3 Obtaining Properties of Format Types 2.2.4 Numeric Formatting Styles 2.2.5 Formatted Data Input and Output 2.3 User-Missing Values 2.3.1 Testing for Missing Values 2.3.2 Initializing User-Missing Value Sets 2.3.3 Changing User-Missing Value Set Width 2.3.4 Inspecting User-Missing Value Sets 2.3.5 Modifying User-Missing Value Sets 2.4 Value Labels 2.4.1 Creation and Destruction 2.4.2 Value Labels Properties 2.4.3 Adding and Removing Labels 2.4.4 Iterating through Value Labels 2.5 Variables 2.5.1 Variable Name 2.5.2 Variable Type and Width 2.5.3 Variable Missing Values 2.5.4 Variable Value Labels 2.5.5 Variable Print and Write Formats 2.5.6 Variable Labels 2.5.7 GUI Attributes 2.5.8 Variable Leave Status 2.5.9 Dictionary Class 2.5.10 Variable Creation and Destruction 2.5.11 Variable Short Names 2.5.12 Variable Relationships 2.5.13 Variable Auxiliary Data 2.5.14 Variable Categorical Values 2.6 Dictionaries 2.6.1 Accessing Variables 2.6.2 Creating Variables 2.6.3 Deleting Variables 2.6.4 Changing Variable Order 2.6.5 Renaming Variables 2.6.6 Weight Variable 2.6.7 Filter Variable 2.6.8 Case Limit 2.6.9 Split Variables 2.6.10 File Label 2.6.11 Documents 2.7 Coding Conventions 2.8 Cases 2.9 Data Sets 2.10 Pools 3 Parsing Command Syntax 4 Processing Data 5 Presenting Output 6 Function Index 7 Concept Index Appendix A Portable File Format A.1 Portable File Characters A.2 Portable File Structure A.3 Portable File Header A.4 Version and Date Info Record A.5 Identification Records A.6 Variable Count Record A.7 Case Weight Variable Record A.8 Variable Records A.9 Value Label Records A.10 Document Record A.11 Portable File Data Appendix B System File Format B.1 File Header Record B.2 Variable Record B.3 Value Labels Records B.4 Document Record B.5 Machine Integer Info Record B.6 Machine Floating-Point Info Record B.7 Variable Display Parameter Record B.8 Long Variable Names Record B.9 Very Long String Record B.10 Miscellaneous Informational Records B.11 Dictionary Termination Record B.12 Data Record Appendix C `q2c' Input Format C.1 Invoking q2c C.2 `q2c' Input Structure C.3 Grammar Rules Appendix D GNU Free Documentation License D.1 ADDENDUM: How to use this License for your documents GNU PSPP Developers Guide ************************* This manual is for GNU PSPP version 0.6.0, software for statistical analysis. Copyright (C) 1997, 1998, 2004, 2005, 2007 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover Texts being "A GNU Manual," and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled "GNU Free Documentation License." (a) The FSF's Back-Cover Text is: "You have the freedom to copy and modify this GNU manual." 1 Introduction ************** This manual is a guide to PSPP internals. Its intended audience is developers who wish to modify or extend PSPP's capabilities. The use of PSPP is documented in a separate manual. *Note Introduction: (pspp)Top. This manual is both a tutorial and a reference manual for PSPP developers. It is ultimately intended to cover everything that developers who wish to implement new PSPP statistical procedures and other commands should know. It is currently incomplete, partly because existing developers have not yet spent enough time on writing, and partly because the interfaces not yet documented are not yet mature enough to making documenting them worthwhile. PSPP developers should have some familiarity with the basics of PSPP from a user's perspective. This manual attempts to refer to the PSPP user manual's descriptions of concepts that PSPP users should find familiar at the time of their first reference. However, it is probably a good idea to at least skim the PSPP manual before reading this one, if you are not already familiar with PSPP. 2 Basic Concepts **************** This chapter introduces basic data structures and other concepts needed for developing in PSPP. 2.1 Values ========== The unit of data in PSPP is a "value". Values are classified by "type" and "width". The type of a value is either "numeric" or "string" (sometimes called alphanumeric). The width of a string value ranges from 1 to `MAX_STRING' bytes. The width of a numeric value is artificially defined to be 0; thus, the type of a value can be inferred from its width. Some support is provided for working with value types and widths, in `data/val-type.h': -- Macro: int MAX_STRING Maximum width of a string value, in bytes, currently 32,767. -- Function: bool val_type_is_valid (enum val_type VAL_TYPE) Returns true if VAL_TYPE is a valid value type, that is, either `VAL_NUMERIC' or `VAL_STRING'. Useful for assertions. -- Function: enum val_type val_type_from_width (int WIDTH) Returns `VAL_NUMERIC' if WIDTH is 0 and thus represents the width of a numeric value, otherwise `VAL_STRING' to indicate that WIDTH is the width of a string value. The following subsections describe how values of each type are represented. 2.1.1 Numeric Values -------------------- A value known to be numeric at compile time is represented as a `double'. PSPP provides three values of `double' for special purposes, defined in `data/val-type.h': -- Macro: double SYSMIS The "system-missing value", used to represent a datum whose true value is unknown, such as a survey question that was not answered by the respondent, or undefined, such as the result of division by zero. PSPP propagates the system-missing value through calculations and compensates for missing values in statistical analyses. *Note Missing Observations: (pspp)Missing Observations, for a PSPP user's view of missing values. PSPP currently defines `SYSMIS' as `-DBL_MAX', that is, the greatest finite negative value of `double'. It is best not to depend on this definition, because PSPP may transition to using an IEEE NaN (not a number) instead at some point in the future. -- Macro: double LOWEST -- Macro: double HIGHEST The greatest finite negative (except for `SYSMIS') and positive values of `double', respectively. These values do not ordinarily appear in user data files. Instead, they are used to implement endpoints of open-ended ranges that are occasionally permitted in PSPP syntax, e.g. `5 THRU HI' as a range of missing values (*note MISSING VALUES: (pspp)MISSING VALUES.). 2.1.2 String Values ------------------- A value known at compile time to have string type is represented as an array of `char'. String values do not necessarily represent readable text strings and may contain arbitrary 8-bit data, including null bytes, control codes, and bytes with the high bit set. Thus, string values are not null-terminated strings, but rather opaque arrays of bytes. `SYSMIS', `LOWEST', and `HIGHEST' have no equivalents as string values. Usually, PSPP fills an unknown or undefined string values with spaces, but PSPP does not treat such a string as a special case when it processes it later. `MAX_STRING', the maximum length of a string value, is defined in `data/val-type.h'. 2.1.3 Runtime Typed Values -------------------------- When a value's type is only known at runtime, it is often represented as a `union value', defined in `data/value.h'. `union value' has two members: a `double' named `f' to store a numeric value and an array of `char' named `s' to a store a string value. A `union value' does not identify the type or width of the data it contains. Code that works with `union values's must therefore have external knowledge of its content, often through the type and width of a `struct variable' (*note Variables::). The array of `char' in `union value' has only a small, fixed capacity of `MAX_SHORT_STRING' bytes. A value that fits within this capacity is called a "short string". Any wider string value, which must be represented by more than one `union value', is called a "long string". -- Macro: int MAX_SHORT_STRING Maximum width of a short string value, never less than 8 bytes. It is wider than 8 bytes on systems where `double' is either larger than 8 bytes or has stricter alignment than 8 bytes. -- Macro: int MIN_LONG_STRING Minimum width of a long string value, that is, `MAX_SHORT_STRING + 1'. Long string variables are slightly harder to work with than short string values, because they cannot be conveniently and efficiently allocated as block scope variables or structure members. The PSPP language exposes this inconvenience to the user: there are many circumstances in PSPP syntax where short strings are allowed but not long strings. Short string variables, for example, may have user-missing values, but long string variables may not (*note Missing Observations: (pspp)Missing Observations.). PSPP provides a few functions for working with `union value's. The most useful are described below. To use these functions, recall that a numeric value has a width of 0. -- Function: size_t value_cnt_from_width (int WIDTH) Returns the number of consecutive `union value's that must be allocated to store a value of the given WIDTH. For a numeric or short string value, the return value is 1; for long string variables, it is greater than 1. -- Function: void value_copy (union value *DST, const union value *SRC, int WIDTH) Copies a value of the given WIDTH from the `union value' array starting at SRC to the one starting at DST. The two arrays must not overlap. -- Function: void value_set_missing (union value *VALUE, int WIDTH) Sets VALUE to `SYSMIS' if it is numeric or to all spaces if it is alphanumeric, according to WIDTH. VALUE must point to the start of a `union value' array of the given WIDTH. -- Function: bool value_is_resizable (const union value *VALUE, int OLD_WIDTH, int NEW_WIDTH) Determines whether VALUE may be resized from OLD_WIDTH to NEW_WIDTH. Resizing is possible if the following criteria are met. First, OLD_WIDTH and NEW_WIDTH must be both numeric or both string widths. Second, if NEW_WIDTH is a short string width and less than OLD_WIDTH, resizing is allowed only if bytes NEW_WIDTH through OLD_WIDTH in VALUE contain only spaces. These rules are part of those used by `mv_is_resizable' and `val_labs_can_set_width'. -- Function: void value_resize (union value *VALUE, int OLD_WIDTH, int NEW_WIDTH) Resizes VALUE from OLD_WIDTH to NEW_WIDTH, which must be allowed by the rules stated above. This has an effect only if NEW_WIDTH is greater than OLD_WIDTH, in which case the bytes newly added to VALUE are cleared to spaces. 2.2 Input and Output Formats ============================ Input and output formats specify how to convert data fields to and from data values (*note Input and Output Formats: (pspp)Input and Output Formats.). PSPP uses `struct fmt_spec' to represent input and output formats. Function prototypes and other declarations related to formats are in the `' header. -- Structure: struct fmt_spec An input or output format, with the following members: `enum fmt_type type' The format type (see below). `int w' Field width, in bytes. The width of numeric fields is always between 1 and 40 bytes, and the width of string fields is always between 1 and 65534 bytes. However, many individual types of formats place stricter limits on field width (see *Note fmt_max_input_width::, *Note fmt_max_output_width::). `int d' Number of decimal places, in character positions. For format types that do not allow decimal places to be specified, this value must be 0. Format types that do allow decimal places have type-specific and often width-specific restrictions on `d' (see *Note fmt_max_input_decimals::, *Note fmt_max_output_decimals::). -- Enumeration: enum fmt_type An enumerated type representing an input or output format type. Each PSPP input and output format has a corresponding enumeration constant prefixed by `FMT': `FMT_F', `FMT_COMMA', `FMT_DOT', and so on. The following sections describe functions for manipulating formats and the data in fields represented by formats. 2.2.1 Constructing and Verifying Formats ---------------------------------------- These functions construct `struct fmt_spec's and verify that they are valid. -- Function: struct fmt_spec fmt_for_input (enum fmt_type TYPE, int W, int D) -- Function: struct fmt_spec fmt_for_output (enum fmt_type TYPE, int W, int D) Constructs a `struct fmt_spec' with the given TYPE, W, and D, asserts that the result is a valid input (or output) format, and returns it. -- Function: struct fmt_spec fmt_for_output_from_input (const struct fmt_spec *INPUT) Given INPUT, which must be a valid input format, returns the equivalent output format. *Note Input and Output Formats: (pspp)Input and Output Formats, for the rules for converting input formats into output formats. -- Function: struct fmt_spec fmt_default_for_width (int WIDTH) Returns the default output format for a variable of the given WIDTH. For a numeric variable, this is F8.2 format; for a string variable, it is the A format of the given WIDTH. The following functions check whether a `struct fmt_spec' is valid for various uses and return true if so, false otherwise. When any of them returns false, it also outputs an explanatory error message using `msg'. To suppress error output, enclose a call to one of these functions by a `msg_disable'/`msg_enable' pair. -- Function: bool fmt_check (const struct fmt_spec *FORMAT, bool FOR_INPUT) -- Function: bool fmt_check_input (const struct fmt_spec *FORMAT) -- Function: bool fmt_check_output (const struct fmt_spec *FORMAT) Checks whether FORMAT is a valid input format (for `fmt_check_input', or `fmt_check' if FOR_INPUT) or output format (for `fmt_check_output', or `fmt_check' if not FOR_INPUT). -- Function: bool fmt_check_type_compat (const struct fmt_spec *FORMAT, enum val_type TYPE) Checks whether FORMAT matches the value type TYPE, that is, if TYPE is `VAL_NUMERIC' and FORMAT is a numeric format or TYPE is `VAL_STRING' and FORMAT is a string format. -- Function: bool fmt_check_width_compat (const struct fmt_spec *FORMAT, int WIDTH) Checks whether FORMAT may be used as an output format for a value of the given WIDTH. `fmt_var_width', described in the following section, can be also be used to determine the value width needed by a format. 2.2.2 Format Utility Functions ------------------------------ These functions work with `struct fmt_spec's. -- Function: int fmt_var_width (const struct fmt_spec *FORMAT) Returns the width for values associated with FORMAT. If FORMAT is a numeric format, the width is 0; if FORMAT is an A format, then the width `FORMAT->w'; otherwise, FORMAT is an AHEX format and its width is `FORMAT->w / 2'. -- Function: char *fmt_to_string (const struct fmt_spec *FORMAT, char S[FMT_STRING_LEN_MAX + 1]) Converts FORMAT to a human-readable format specifier in S and returns S. FORMAT need not be a valid input or output format specifier, e.g. it is allowed to have an excess width or decimal places. In particular, if FORMAT has decimals, they are included in the output string, even if FORMAT's type does not allow decimals, to allow accurately presenting incorrect formats to the user. -- Function: bool fmt_equal (const struct fmt_spec *A, const struct fmt_spec *B) Compares A and B memberwise and returns true if they are identical, false otherwise. FORMAT need not be a valid input or output format specifier. -- Function: void fmt_resize (struct fmt_spec *FMT, int WIDTH) Sets the width of FMT to a valid format for a `union value' of size WIDTH. 2.2.3 Obtaining Properties of Format Types ------------------------------------------ These functions work with `enum fmt_type's instead of the higher-level `struct fmt_spec's. Their primary purpose is to report properties of each possible format type, which in turn allows clients to abstract away many of the details of the very heterogeneous requirements of each format type. The first group of functions works with format type names. -- Function: const char *fmt_name (enum fmt_type TYPE) Returns the name for the given TYPE, e.g. `"COMMA"' for `FMT_COMMA'. -- Function: bool fmt_from_name (const char *NAME, enum fmt_type *TYPE) Tries to find the `enum fmt_type' associated with NAME. If successful, sets `*TYPE' to the type and returns true; otherwise, returns false without modifying `*TYPE'. The functions below query basic limits on width and decimal places for each kind of format. -- Function: bool fmt_takes_decimals (enum fmt_type TYPE) Returns true if a format of the given TYPE is allowed to have a nonzero number of decimal places (the `d' member of `struct fmt_spec'), false if not. -- Function: int fmt_min_input_width (enum fmt_type TYPE) -- Function: int fmt_max_input_width (enum fmt_type TYPE) -- Function: int fmt_min_output_width (enum fmt_type TYPE) -- Function: int fmt_max_output_width (enum fmt_type TYPE) Returns the minimum or maximum width (the `w' member of `struct fmt_spec') allowed for an input or output format of the specified TYPE. -- Function: int fmt_max_input_decimals (enum fmt_type TYPE, int WIDTH) -- Function: int fmt_max_output_decimals (enum fmt_type TYPE, int WIDTH) Returns the maximum number of decimal places allowed for an input or output format, respectively, of the given TYPE and WIDTH. Returns 0 if the specified TYPE does not allow any decimal places or if WIDTH is too narrow to allow decimal places. -- Function: int fmt_step_width (enum fmt_type TYPE) Returns the "width step" for a `struct fmt_spec' of the given TYPE. A `struct fmt_spec''s width must be a multiple of its type's width step. Most format types have a width step of 1, so that their formats' widths may be any integer within the valid range, but hexadecimal numeric formats and AHEX string formats have a width step of 2. These functions allow clients to broadly determine how each kind of input or output format behaves. -- Function: bool fmt_is_string (enum fmt_type TYPE) -- Function: bool fmt_is_numeric (enum fmt_type TYPE) Returns true if TYPE is a format for numeric or string values, respectively, false otherwise. -- Function: enum fmt_category fmt_get_category (enum fmt_type TYPE) Returns the category within which TYPE falls. -- Enumeration: enum fmt_category A group of format types. Format type categories correspond to the input and output categories described in the PSPP user documentation (*note Input and Output Formats: (pspp)Input and Output Formats.). Each format is in exactly one category. The categories have bitwise disjoint values to make it easy to test whether a format type is in one of multiple categories, e.g. if (fmt_get_category (type) & (FMT_CAT_DATE | FMT_CAT_TIME)) { /* ...`type' is a date or time format... */ } The format categories are: `FMT_CAT_BASIC' Basic numeric formats. `FMT_CAT_CUSTOM' Custom currency formats. `FMT_CAT_LEGACY' Legacy numeric formats. `FMT_CAT_BINARY' Binary formats. `FMT_CAT_HEXADECIMAL' Hexadecimal formats. `FMT_CAT_DATE' Date formats. `FMT_CAT_TIME' Time formats. `FMT_CAT_DATE_COMPONENT' Date component formats. `FMT_CAT_STRING' String formats. The PSPP input and output routines use the following pair of functions to convert `enum fmt_type's to and from the separate set of codes used in system and portable files: -- Function: int fmt_to_io (enum fmt_type TYPE) Returns the format code used in system and portable files that corresponds to TYPE. -- Function: bool fmt_from_io (int IO, enum fmt_type *TYPE) Converts IO, a format code used in system and portable files, into a `enum fmt_type' in `*TYPE'. Returns true if successful, false if IO is not valid. These functions reflect the relationship between input and output formats. -- Function: enum fmt_type fmt_input_to_output (enum fmt_type TYPE) Returns the output format type that is used by default by DATA LIST and other input procedures when TYPE is specified as an input format. The conversion from input format to output format is more complicated than simply changing the format. *Note fmt_for_output_from_input::, for a function that performs the entire conversion. -- Function: bool fmt_usable_for_input (enum fmt_type TYPE) Returns true if TYPE may be used as an input format type, false otherwise. The custom currency formats, in particular, may be used for output but not for input. All format types are valid for output. The final group of format type property functions obtain human-readable templates that illustrate the formats graphically. -- Function: const char *fmt_date_template (enum fmt_type TYPE) Returns a formatting template for TYPE, which must be a date or time format type. These formats are used by `data_in' and `data_out' to guide parsing and formatting date and time data. -- Function: char *fmt_dollar_template (const struct fmt_spec *FORMAT) Returns a string of the form `$#,###.##' according to FORMAT, which must be of type `FMT_DOLLAR'. The caller must free the string with `free'. 2.2.4 Numeric Formatting Styles ------------------------------- Each of the basic numeric formats (F, E, COMMA, DOT, DOLLAR, PCT) and custom currency formats (CCA, CCB, CCC, CCD, CCE) has an associated numeric formatting style, represented by `struct fmt_number_style'. Input and output conversion of formats that have numeric styles is determined mainly by the style, although the formatting rules have special cases that are not represented within the style. -- Structure: struct fmt_number_style A structure type with the following members: `struct substring neg_prefix' `struct substring prefix' `struct substring suffix' `struct substring neg_suffix' A set of strings used a prefix to negative numbers, a prefix to every number, a suffix to every number, and a suffix to negative numbers, respectively. Each of these strings is no more than `FMT_STYLE_AFFIX_MAX' bytes (currently 16) bytes in length. These strings must be freed with `ss_dealloc' when no longer needed. `decimal' The character used as a decimal point. It must be either `.' or `,'. `grouping' The character used for grouping digits to the left of the decimal point. It may be `.' or `,', in which case it must not be equal to `decimal', or it may be set to 0 to disable grouping. The following functions are provided for working with numeric formatting styles. -- Function: void fmt_number_style_init (struct fmt_number_style *STYLE) Initialises a `struct fmt_number_style' with all of the prefixes and suffixes set to the empty string, `.' as the decimal point character, and grouping disables. -- Function: void fmt_number_style_destroy (struct fmt_number_style *STYLE) Destroys STYLE, freeing its storage. -- Function: struct fmt_number_style *fmt_create (void) A function which creates an array of all the styles used by pspp, and calls fmt_number_style_init on each of them. -- Function: void fmt_done (struct fmt_number_style *STYLES) A wrapper function which takes an array of `struct fmt_number_style', calls fmt_number_style_destroy on each of them, and then frees the array. -- Function: int fmt_affix_width (const struct fmt_number_style *STYLE) Returns the total length of STYLE's `prefix' and `suffix'. -- Function: int fmt_neg_affix_width (const struct fmt_number_style *STYLE) Returns the total length of STYLE's `neg_prefix' and `neg_suffix'. PSPP maintains a global set of number styles for each of the basic numeric formats and custom currency formats. The following functions work with these global styles: -- Function: const struct fmt_number_style * fmt_get_style (enum fmt_type TYPE) Returns the numeric style for the given format TYPE. -- Function: void fmt_check_style (const struct fmt_number_style *STYLE) Asserts that style is self consistent. -- Function: const char * fmt_name (enum fmt_type TYPE) Returns the name of the given format TYPE. 2.2.5 Formatted Data Input and Output ------------------------------------- These functions provide the ability to convert data fields into `union value's and vice versa. -- Function: bool data_in (struct substring INPUT, enum legacy_encoding LEGACY_ENCODING, enum fmt_type TYPE, int IMPLIED_DECIMALS, int FIRST_COLUMN, union value *OUTPUT, int WIDTH) Parses INPUT as a field containing data in the given format TYPE. The resulting value is stored in OUTPUT, which has the given WIDTH. For consistency, WIDTH must be 0 if TYPE is a numeric format type and greater than 0 if TYPE is a string format type. Ordinarily LEGACY_ENCODING should be `LEGACY_NATIVE', indicating that INPUT is encoded in the character set conventionally used on the host machine. It may be set to `LEGACY_EBCDIC' to cause INPUT to be re-encoded from EBCDIC during data parsing. If INPUT is the empty string (with length 0), OUTPUT is set to the value set on SET BLANKS (*note SET BLANKS: (pspp)SET BLANKS.) for a numeric value, or to all spaces for a string value. This applies regardless of the usual parsing requirements for TYPE. If IMPLIED_DECIMALS is greater than zero, then the numeric result is shifted right by IMPLIED_DECIMALS decimal places if INPUT does not contain a decimal point character or an exponent. Only certain numeric format types support implied decimal places; for string formats and other numeric formats, IMPLIED_DECIMALS has no effect. DATA LIST FIXED is the primary user of this feature (*note DATA LIST FIXED: (pspp)DATA LIST FIXED.). Other callers should generally specify 0 for IMPLIED_DECIMALS, to disable this feature. When INPUT contains invalid input data, `data_in' outputs a message using `msg'. If FIRST_COLUMN is nonzero, it is included in any such error message as the 1-based column number of the start of the field. The last column in the field is calculated as FIRST_COLUMN + INPUT - 1. To suppress error output, enclose the call to `data_in' by calls to `msg_disable' and `msg_enable'. This function returns true on success, false if a message was output (even if suppressed). Overflow and underflow provoke warnings but are not propagated to the caller as errors. This function is declared in `data/data-in.h'. -- Function: void data_out (const union value *INPUT, const struct fmt_spec *FORMAT, char *OUTPUT) -- Function: void data_out_legacy (const union value *INPUT, enum legacy_encoding LEGACY_ENCODING, const struct fmt_spec *FORMAT, char *OUTPUT) Converts the data pointed to by INPUT into a data field in OUTPUT according to output format specifier FORMAT, which must be a valid output format. Exactly `FORMAT->w' bytes are written to OUTPUT. The width of INPUT is also inferred from FORMAT using an algorithm equivalent to `fmt_var_width'. If `data_out' is called, or `data_out_legacy' is called with LEGACY_ENCODING set to `LEGACY_NATIVE', OUTPUT will be encoded in the character set conventionally used on the host machine. If LEGACY_ENCODING is set to `LEGACY_EBCDIC', OUTPUT will be re-encoded from EBCDIC during data output. When INPUT contains data that cannot be represented in the given FORMAT, `data_out' may output a message using `msg', although the current implementation does not consistently do so. To suppress error output, enclose the call to `data_out' by calls to `msg_disable' and `msg_enable'. This function is declared in `data/data-out.h'. 2.3 User-Missing Values ======================= In addition to the system-missing value for numeric values, each variable has a set of user-missing values (*note MISSING VALUES: (pspp)MISSING VALUES.). A set of user-missing values is represented by `struct missing_values'. It is rarely necessary to interact directly with a `struct missing_values' object. Instead, the most common operation, querying whether a particular value is a missing value for a given variable, is most conveniently executed through functions on `struct variable'. *Note Variable Missing Values::, for details. A `struct missing_values' is essentially a set of `union value's that have a common value width (*note Values::). For a set of missing values associated with a variable (the common case), the set's width is the same as the variable's width. The contents of a set of missing values is subject to some restrictions. Regardless of width, a set of missing values is allowed to be empty. Otherwise, its possible contents depend on its width: 0 (numeric values) Up to three discrete numeric values, or a range of numeric values (which includes both ends of the range), or a range plus one discrete numeric value. 1...MAX_SHORT_STRING - 1 (short string values) Up to three discrete string values (with the same width as the set). MAX_SHORT_STRING...MAX_STRING (long string values) Always empty. These somewhat arbitrary restrictions are the same as those imposed by SPSS. In PSPP we could easily eliminate these restrictions, but doing so would also require us to extend the system file format in an incompatible way, which we consider a bad tradeoff. Function prototypes and other declarations related to missing values are declared in `data/missing-values.h'. -- Structure: struct missing_values Opaque type that represents a set of missing values. The most often useful functions for missing values are those for testing whether a given value is missing, described in the following section. Several other functions for creating, inspecting, and modifying `struct missing_values' objects are described afterward, but these functions are much more rarely useful. No function for destroying a `struct missing_values' is provided, because `struct missing_values' does not contain any pointers or other references to resources that need deallocation. 2.3.1 Testing for Missing Values -------------------------------- The most often useful functions for missing values are those for testing whether a given value is missing, described here. However, using one of the corresponding missing value testing functions for variables can be even easier (*note Variable Missing Values::). -- Function: bool mv_is_value_missing (const struct missing_values *MV, const union value *VALUE, enum mv_class CLASS) -- Function: bool mv_is_num_missing (const struct missing_values *MV, double VALUE, enum mv_class CLASS) -- Function: bool mv_is_str_missing (const struct missing_values *MV, const char VALUE[], enum mv_class CLASS) Tests whether VALUE is in one of the categories of missing values given by CLASS. Returns true if so, false otherwise. MV determines the width of VALUE and provides the set of user-missing values to test. The only difference among these functions in the form in which VALUE is provided, so you may use whichever function is most convenient. The CLASS argument determines the exact kinds of missing values that the functions test for: -- Enumeration: enum mv_class MV_USER Returns true if VALUE is in the set of user-missing values given by MV. MV_SYSTEM Returns true if VALUE is system-missing. (If MV represents a set of string values, then VALUE is never system-missing.) MV_ANY MV_USER | MV_SYSTEM Returns true if VALUE is user-missing or system-missing. MV_NONE Always returns false, that is, VALUE is never considered missing. 2.3.2 Initializing User-Missing Value Sets ------------------------------------------ -- Function: void mv_init (struct missing_values *MV, int WIDTH) Initializes MV as a set of user-missing values. The set is initially empty. Any values added to it must have the specified WIDTH. -- Function: void mv_copy (struct missing_values *MV, const struct missing_values *OLD) Initializes MV as a copy of the existing set of user-missing values OLD. -- Function: void mv_clear (struct missing_values *MV) Empties the user-missing value set MV, retaining its existing width. 2.3.3 Changing User-Missing Value Set Width ------------------------------------------- A few PSPP language constructs copy sets of user-missing values from one variable to another. When the source and target variables have the same width, this is simple. But when the target variable's width might be different from the source variable's, it takes a little more work. The functions described here can help. In fact, it is usually unnecessary to call these functions directly. Most of the time `var_set_missing_values', which uses `mv_resize' internally to resize the new set of missing values to the required width, may be used instead. *Note var_set_missing_values::, for more information. -- Function: bool mv_is_resizable (const struct missing_values *MV, int NEW_WIDTH) Tests whether MV's width may be changed to NEW_WIDTH using `mv_resize'. Returns true if it is allowed, false otherwise. If NEW_WIDTH is a long string width, MV may be resized only if it is empty. Otherwise, if MV contains any missing values, then it may be resized only if each missing value may be resized, as determined by `value_is_resizable' (*note value_is_resizable::). -- Function: void mv_resize (struct missing_values *MV, int WIDTH) Changes MV's width to WIDTH. MV and WIDTH must satisfy the constraints explained above. When a string missing value set's width is increased, each user-missing value is padded on the right with spaces to the new width. 2.3.4 Inspecting User-Missing Value Sets ---------------------------------------- These functions inspect the properties and contents of `struct missing_values' objects. The first set of functions inspects the discrete values that numeric and short string sets of user-missing values may contain: -- Function: bool mv_is_empty (const struct missing_values *MV) Returns true if MV contains no user-missing values, false if it contains at least one user-missing value (either a discrete value or a numeric range). -- Function: int mv_get_width (const struct missing_values *MV) Returns the width of the user-missing values that MV represents. -- Function: int mv_n_values (const struct missing_values *MV) Returns the number of discrete user-missing values included in MV. The return value will be between 0 and 3. For sets of numeric user-missing values that include a range, the return value will be 0 or 1. -- Function: bool mv_has_value (const struct missing_values *MV) Returns true if MV has at least one discrete user-missing values, that is, if `mv_n_values' would return nonzero for MV. -- Function: void mv_get_value (const struct missing_values *MV, union value *VALUE, int INDEX) Copies the discrete user-missing value in MV with the given INDEX into VALUE. The index must be less than the number of discrete user-missing values in MV, as reported by `mv_n_values'. The second set of functions inspects the single range of values that numeric sets of user-missing values may contain: -- Function: bool mv_has_range (const struct missing_values *MV) Returns true if MV includes a range, false otherwise. -- Function: void mv_get_range (const struct missing_values *MV, double *LOW, double *HIGH) Stores the low endpoint of MV's range in `*LOW' and the high endpoint of the range in `*HIGH'. MV must include a range. 2.3.5 Modifying User-Missing Value Sets --------------------------------------- These functions modify the contents of `struct missing_values' objects. The first set of functions applies to all sets of user-missing values: -- Function: bool mv_add_value (struct missing_values *MV, const union value *VALUE) -- Function: bool mv_add_str (struct missing_values *MV, const char VALUE[]) -- Function: bool mv_add_num (struct missing_values *MV, double VALUE) Attempts to add the given discrete VALUE to set of user-missing values MV. VALUE must have the same width as MV. Returns true if VALUE was successfully added, false if the set could not accept any more discrete values. (Always returns false if MV is a set of long string user-missing values.) These functions are equivalent, except for the form in which VALUE is provided, so you may use whichever function is most convenient. -- Function: void mv_pop_value (struct missing_values *MV, union value *VALUE) Removes a discrete value from MV (which must contain at least one discrete value) and stores it in VALUE. -- Function: void mv_replace_value (struct missing_values *MV, const union value *VALUE, int INDEX) Replaces the discrete value with the given INDEX in MV (which must contain at least INDEX + 1 discrete values) with VALUE. The second set of functions applies only to numeric sets of user-missing values: -- Function: bool mv_add_range (struct missing_values *MV, double LOW, double HIGH) Attempts to add a numeric range covering LOW...HIGH (inclusive on both ends) to MV, which must be a numeric set of user-missing values. Returns true if the range is successful added, false on failure. Fails if MV already contains a range, or if MV contains more than one discrete value, or if LOW > HIGH. -- Function: void mv_pop_range (struct missing_values *MV, double *LOW, double *HIGH) Given MV, which must be a numeric set of user-missing values that contains a range, removes that range from MV and stores its low endpoint in `*LOW' and its high endpoint in `*HIGH'. 2.4 Value Labels ================ Each variable has a set of value labels (*note VALUE LABELS: (pspp)VALUE LABELS.), represented as `struct val_labs'. A `struct val_labs' is essentially a map from `union value's to strings. All of the values in a set of value labels have the same width, which for a set of value labels owned by a variable (the common case) is the same as its variable. Numeric and short string sets of value labels may contain any number of entries. Long string sets of value labels may not contain any value labels at all, due to a corresponding restriction in SPSS. In PSPP we could easily eliminate this restriction, but doing so would also require us to extend the system file format in an incompatible way, which we consider a bad tradeoff. It is rarely necessary to interact directly with a `struct val_labs' object. Instead, the most common operation, looking up the label for a value of a given variable, can be conveniently executed through functions on `struct variable'. *Note Variable Value Labels::, for details. Function prototypes and other declarations related to missing values are declared in `data/value-labels.h'. -- Structure: struct val_labs Opaque type that represents a set of value labels. The most often useful function for value labels is `val_labs_find', for looking up the label associated with a value. -- Function: char * val_labs_find (const struct val_labs *VAL_LABS, union value VALUE) Looks in VAL_LABS for a label for the given VALUE. Returns the label, if one is found, or a null pointer otherwise. Several other functions for working with value labels are described in the following section, but these are more rarely useful. 2.4.1 Creation and Destruction ------------------------------ These functions create and destroy `struct val_labs' objects. -- Function: struct val_labs * val_labs_create (int WIDTH) Creates and returns an initially empty set of value labels with the given WIDTH. -- Function: struct val_labs * val_labs_clone (const struct val_labs *VAL_LABS) Creates and returns a set of value labels whose width and contents are the same as those of VAR_LABS. -- Function: void val_labs_clear (struct val_labs *VAR_LABS) Deletes all value labels from VAR_LABS. -- Function: void val_labs_destroy (struct val_labs *VAR_LABS) Destroys VAR_LABS, which must not be referenced again. 2.4.2 Value Labels Properties ----------------------------- These functions inspect and manipulate basic properties of `struct val_labs' objects. -- Function: size_t val_labs_count (const struct val_labs *VAL_LABS) Returns the number of value labels in VAL_LABS. -- Function: bool val_labs_can_set_width (const struct val_labs *VAL_LABS, int NEW_WIDTH) Tests whether VAL_LABS's width may be changed to NEW_WIDTH using `val_labs_set_width'. Returns true if it is allowed, false otherwise. A set of value labels may be resized to a given width only if each value in it may be resized to that width, as determined by `value_is_resizable' (*note value_is_resizable::). -- Function: void val_labs_set_width (struct val_labs *VAL_LABS, int NEW_WIDTH) Changes the width of VAL_LABS's values to NEW_WIDTH, which must be a valid new width as determined by `val_labs_can_set_width'. If NEW_WIDTH is a long string width, this function deletes all value labels from VAL_LABS. 2.4.3 Adding and Removing Labels -------------------------------- These functions add and remove value labels from a `struct val_labs' object. These functions apply only to numeric and short string sets of value labels. They have no effect on long string sets of value labels, since these sets are always empty. -- Function: bool val_labs_add (struct val_labs *VAL_LABS, union value VALUE, const char *LABEL) Adds LABEL to in VAR_LABS as a label for VALUE, which must have the same width as the set of value labels. Returns true if successful, false if VALUE already has a label or if VAL_LABS has long string width. -- Function: void val_labs_replace (struct val_labs *VAL_LABS, union value VALUE, const char *LABEL) Adds LABEL to in VAR_LABS as a label for VALUE, which must have the same width as the set of value labels. If VALUE already has a label in VAR_LABS, it is replaced. Has no effect if VAR_LABS has long string width. -- Function: bool val_labs_remove (struct val_labs *VAL_LABS, union value VALUE) Removes from VAL_LABS any label for VALUE, which must have the same width as the set of value labels. Returns true if a label was removed, false otherwise. 2.4.4 Iterating through Value Labels ------------------------------------ These functions allow iteration through the set of value labels represented by a `struct val_labs' object. They are usually used in the context of a `for' loop: struct val_labs val_labs; struct val_labs_iterator *i; struct val_lab *vl; ... for (vl = val_labs_first (val_labs, &i); vl != NULL; vl = val_labs_next (val_labs, &i)) { ...do something with `vl'... } The value labels in a `struct val_labs' must not be modified as it is undergoing iteration. -- Structure: struct val_lab Represents a value label for iteration purposes, with two client-visible members: `union value value' Value being labeled, of the same width as the `struct val_labs' being iterated. `const char *label' The label, as a null-terminated string. -- Structure: struct val_labs_iterator Opaque object that represents the current state of iteration through a set of value value labels. Automatically destroyed by successful completion of iteration. Must be destroyed manually in other circumstances, by calling `val_labs_done'. -- Function: struct val_lab * val_labs_first (const struct val_labs *VAL_LABS, struct val_labs_iterator **ITERATOR) If VAL_LABS contains at least one value label, starts an iteration through VAL_LABS, initializes `*ITERATOR' to point to a newly allocated iterator, and returns the first value label in VAL_LABS. If VAL_LABS is empty, sets `*ITERATOR' to null and returns a null pointer. This function creates iterators that traverse sets of value labels in no particular order. -- Function: struct val_lab * val_labs_first_sorted (const struct val_labs *VAL_LABS, struct val_labs_iterator **ITERATOR) Same as `val_labs_first', except that the created iterator traverses the set of value labels in ascending order of value. -- Function: struct val_lab * val_labs_next (const struct val_labs *VAL_LABS, struct val_labs_iterator **ITERATOR) Advances an iterator created with `val_labs_first' or `val_labs_first_sorted' to the next value label, which is returned. If the set of value labels is exhausted, returns a null pointer after freeing `*ITERATOR' and setting it to a null pointer. -- Function: void val_labs_done (struct val_labs_iterator **ITERATOR) Frees `*ITERATOR' and sets it to a null pointer. Does not need to be called explicitly if `val_labs_next' returns a null pointer, indicating that all value labels have been visited. 2.5 Variables ============= A PSPP variable is represented by `struct variable', an opaque type declared in `data/variable.h' along with related declarations. *Note Variables: (pspp)Variables, for a description of PSPP variables from a user perspective. PSPP is unusual among computer languages in that, by itself, a PSPP variable does not have a value. Instead, a variable in PSPP takes on a value only in the context of a case, which supplies one value for each variable in a set of variables (*note Cases::). The set of variables in a case, in turn, are ordinarily part of a dictionary (*note Dictionaries::). Every variable has several attributes, most of which correspond directly to one of the variable attributes visible to PSPP users (*note Attributes: (pspp)Attributes.). The following sections describe variable-related functions and macros. 2.5.1 Variable Name ------------------- A variable name is a string between 1 and `VAR_NAME_LEN' bytes long that satisfies the rules for PSPP identifiers (*note Tokens: (pspp)Tokens.). Variable names are mixed-case and treated case-insensitively. -- Macro: int VAR_NAME_LEN Maximum length of a variable name, in bytes, currently 64. Only one commonly useful function relates to variable names: -- Function: const char * var_get_name (const struct variable *VAR) Returns VAR's variable name as a C string. A few other functions are much more rarely used. Some of these functions are used internally by the dictionary implementation: -- Function: void var_set_name (struct variable *VAR, const char *NEW_NAME) Changes the name of VAR to NEW_NAME, which must be a "plausible" name as defined below. This function cannot be applied to a variable that is part of a dictionary. Use `dict_rename_var' instead (*note Dictionary Renaming Variables::). -- Function: bool var_is_valid_name (const char *NAME, bool ISSUE_ERROR) -- Function: bool var_is_plausible_name (const char *NAME, bool ISSUE_ERROR) Tests NAME for validity or "plausibility." Returns true if the name is acceptable, false otherwise. If the name is not acceptable and ISSUE_ERROR is true, also issues an error message explaining the violation. A valid name is one that fully satisfies all of the requirements for variable names (*note Tokens: (pspp)Tokens.). A "plausible" name is simply a string whose length is in the valid range and that is not a reserved word. PSPP accepts plausible but invalid names as variable names in some contexts where the character encoding scheme is ambiguous, as when reading variable names from system files. -- Function: enum dict_class var_get_dict_class (const struct variable *VAR) Returns the dictionary class of VAR's name (*note Dictionary Class::). 2.5.2 Variable Type and Width ----------------------------- A variable's type and width are the type and width of its values (*note Values::). -- Function: enum val_type var_get_type (const struct variable *VAR) Returns the type of variable VAR. -- Function: int var_get_width (const struct variable *VAR) Returns the width of variable VAR. -- Function: void var_set_width (struct variable *VAR, int WIDTH) Sets the width of variable VAR to WIDTH. The width of a variable should not normally be changed after the variable is created, so this function is rarely used. This function cannot be applied to a variable that is part of a dictionary. -- Function: bool var_is_numeric (const struct variable *VAR) Returns true if VAR is a numeric variable, false otherwise. -- Function: bool var_is_alpha (const struct variable *VAR) Returns true if VAR is an alphanumeric (string) variable, false otherwise. -- Function: bool var_is_short_string (const struct variable *VAR) Returns true if VAR is a string variable of width `MAX_SHORT_STRING' or less, false otherwise. -- Function: bool var_is_long_string (const struct variable *VAR) Returns true if VAR is a string variable of width greater than `MAX_SHORT_STRING', false otherwise. -- Function: size_t var_get_value_cnt (const struct variable *VAR) Returns the number of `union value's needed to hold an instance of variable VAR. `var_get_value_cnt (var)' is equivalent to `value_cnt_from_width (var_get_width (var))'. 2.5.3 Variable Missing Values ----------------------------- A numeric or short string variable may have a set of user-missing values (*note MISSING VALUES: (pspp)MISSING VALUES.), represented as a `struct missing_values' (*note User-Missing Values::). The most frequent operation on a variable's missing values is to query whether a value is user- or system-missing: -- Function: bool var_is_value_missing (const struct variable *VAR, const union value *VALUE, enum mv_class CLASS) -- Function: bool var_is_num_missing (const struct variable *VAR, double VALUE, enum mv_class CLASS) -- Function: bool var_is_str_missing (const struct variable *VAR, const char VALUE[], enum mv_class CLASS) Tests whether VALUE is a missing value of the given CLASS for variable VAR and returns true if so, false otherwise. `var_is_num_missing' may only be applied to numeric variables; `var_is_str_missing' may only be applied to string variables. For string variables, VALUE must contain exactly as many characters as VAR's width. `var_is_TYPE_missing (VAR, VALUE, CLASS)' is equivalent to `mv_is_TYPE_missing (var_get_missing_values (VAR), VALUE, CLASS)'. In addition, a few functions are provided to work more directly with a variable's `struct missing_values': -- Function: const struct missing_values * var_get_missing_values (const struct variable *VAR) Returns the `struct missing_values' associated with VAR. The caller must not modify the returned structure. The return value is always non-null. -- Function: void var_set_missing_values (struct variable *VAR, const struct missing_values *MISS) Changes VAR's missing values to a copy of MISS, or if MISS is a null pointer, clears VAR's missing values. If MISS is non-null, it must have the same width as VAR or be resizable to VAR's width (*note mv_resize::). The caller retains ownership of MISS. b -- Function: void var_clear_missing_values (struct variable *VAR) Clears VAR's missing values. Equivalent to `var_set_missing_values (VAR, NULL)'. -- Function: bool var_has_missing_values (const struct variable *VAR) Returns true if VAR has any missing values, false if it has none. Equivalent to `mv_is_empty (var_get_missing_values (VAR))'. 2.5.4 Variable Value Labels --------------------------- A numeric or short string variable may have a set of value labels (*note VALUE LABELS: (pspp)VALUE LABELS.), represented as a `struct val_labs' (*note Value Labels::). The most commonly useful functions for value labels return the value label associated with a value: -- Function: const char * var_lookup_value_label (const struct variable *VAR, const union value *VALUE) Looks for a label for VALUE in VAR's set of value labels. Returns the label if one exists, otherwise a null pointer. -- Function: void var_append_value_name (const struct variable *VAR, const union value *VALUE, struct string *STR) Looks for a label for VALUE in VAR's set of value labels. If a label exists, it will be appended to the string pointed to by STR. Otherwise, it formats VALUE using VAR's print format (*note Input and Output Formats::) and appends the formatted string. The underlying `struct val_labs' structure may also be accessed directly using the functions described below. -- Function: bool var_has_value_labels (const struct variable *VAR) Returns true if VAR has at least one value label, false otherwise. -- Function: const struct val_labs * var_get_value_labels (const struct variable *VAR) Returns the `struct val_labs' associated with VAR. If VAR has no value labels, then the return value may or may not be a null pointer. The variable retains ownership of the returned `struct val_labs', which the caller must not attempt to modify. -- Function: void var_set_value_labels (struct variable *VAR, const struct val_labs *VAL_LABS) Replaces VAR's value labels by a copy of VAL_LABS. The caller retains ownership of VAL_LABS. If VAL_LABS is a null pointer, then VAR's value labels, if any, are deleted. -- Function: void var_clear_value_labels (struct variable *VAR) Deletes VAR's value labels. Equivalent to `var_set_value_labels (VAR, NULL)'. A final group of functions offers shorthands for operations that would otherwise require getting the value labels from a variable, copying them, modifying them, and then setting the modified value labels into the variable (making a second copy): -- Function: bool var_add_value_label (struct variable *VAR, const union value *VALUE, const char *LABEL) Attempts to add a copy of LABEL as a label for VALUE for the given VAR. If VALUE already has a label, then the old label is retained. Returns true if a label is added, false if there was an existing label for VALUE or if VAR is a long string variable. Either way, the caller retains ownership of VALUE and LABEL. -- Function: void var_replace_value_label (struct variable *VAR, const union value *VALUE, const char *LABEL) Attempts to add a copy of LABEL as a label for VALUE for the given VAR. If VALUE already has a label, then LABEL replaces the old label. Either way, the caller retains ownership of VALUE and LABEL. If VAR is a long string variable, this function has no effect. 2.5.5 Variable Print and Write Formats -------------------------------------- Each variable has an associated pair of output formats, called its "print format" and "write format". *Note Input and Output Formats: (pspp)Input and Output Formats, for an introduction to formats. *Note Input and Output Formats::, for a developer's description of format representation. The print format is used to convert a variable's data values to strings for human-readable output. The write format is used similarly for machine-readable output, primarily by the WRITE transformation (*note WRITE: (pspp)WRITE.). Most often a variable's print and write formats are the same. A newly created variable by default has format F8.2 if it is numeric or an A format with the same width as the variable if it is string. Many creators of variables override these defaults. Both the print format and write format are output formats. Input formats are not part of `struct variable'. Instead, input programs and transformations keep track of variable input formats themselves. The following functions work with variable print and write formats. -- Function: const struct fmt_spec * var_get_print_format (const struct variable *VAR) -- Function: const struct fmt_spec * var_get_write_format (const struct variable *VAR) Returns VAR's print or write format, respectively. -- Function: void var_set_print_format (struct variable *VAR, const struct fmt_spec *FORMAT) -- Function: void var_set_write_format (struct variable *VAR, const struct fmt_spec *FORMAT) -- Function: void var_set_both_formats (struct variable *VAR, const struct fmt_spec *FORMAT) Sets VAR's print format, write format, or both formats, respectively, to a copy of FORMAT. 2.5.6 Variable Labels --------------------- A variable label is a string that describes a variable. Variable labels may contain spaces and punctuation not allowed in variable names. *Note VARIABLE LABELS: (pspp)VARIABLE LABELS, for a user-level description of variable labels. The most commonly useful functions for variable labels are those to retrieve a variable's label: -- Function: const char * var_to_string (const struct variable *VAR) Returns VAR's variable label, if it has one, otherwise VAR's name. In either case the caller must not attempt to modify or free the returned string. This function is useful for user output. -- Function: const char * var_get_label (const struct variable *VAR) Returns VAR's variable label, if it has one, or a null pointer otherwise. A few other variable label functions are also provided: -- Function: void var_set_label (struct variable *VAR, const char *LABEL) Sets VAR's variable label to a copy of LABEL, or removes any label from VAR if LABEL is a null pointer or contains only spaces. Leading and trailing spaces are removed from the variable label and its remaining content is truncated at 255 bytes. -- Function: void var_clear_label (struct variable *VAR) Removes any variable label from VAR. -- Function: bool var_has_label (const struct variable *VAR) Returns true if VAR has a variable label, false otherwise. 2.5.7 GUI Attributes -------------------- These functions and types access and set attributes that are mainly used by graphical user interfaces. Their values are also stored in and retrieved from system files (but not portable files). The first group of functions relate to the measurement level of numeric data. New variables are assigned a nominal level of measurement by default. -- Enumeration: enum measure Measurement level. Available values are: `MEASURE_NOMINAL' Numeric data values are arbitrary. Arithmetic operations and numerical comparisons of such data are not meaningful. `MEASURE_ORDINAL' Numeric data values indicate progression along a rank order. Arbitrary arithmetic operations such as addition are not meaningful on such data, but inequality comparisons (less, greater, etc.) have straightforward interpretations. `MEASURE_SCALE' Ratios, sums, etc. of numeric data values have meaningful interpretations. PSPP does not have a separate category for interval data, which would naturally fall between the ordinal and scale measurement levels. -- Function: bool measure_is_valid (enum measure MEASURE) Returns true if MEASURE is a valid level of measurement, that is, if it is one of the `enum measure' constants listed above, and false otherwise. -- Function: enum measure var_get_measure (const struct variable *VAR) -- Function: void var_set_measure (struct variable *VAR, enum measure MEASURE) Gets or sets VAR's measurement level. The following set of functions relates to the width of on-screen columns used for displaying variable data in a graphical user interface environment. The unit of measurement is the width of a character. For proportionally spaced fonts, this is based on the average width of a character. -- Function: int var_get_display_width (const struct variable *VAR) -- Function: void var_set_display_width (struct variable *VAR, int DISPLAY_WIDTH) Gets or sets VAR's display width. -- Function: int var_default_display_width (int WIDTH) Returns the default display width for a variable with the given WIDTH. The default width of a numeric variable is 8. The default width of a string variable is WIDTH or 32, whichever is less. The final group of functions work with the justification of data when it is displayed in on-screen columns. New variables are by default right-justified. -- Enumeration: enum alignment Text justification. Possible values are `ALIGN_LEFT', `ALIGN_RIGHT', and `ALIGN_CENTRE'. -- Function: bool alignment_is_valid (enum alignment ALIGNMENT) Returns true if ALIGNMENT is a valid alignment, that is, if it is one of the `enum alignment' constants listed above, and false otherwise. -- Function: enum alignment var_get_alignment (const struct variable *VAR) -- Function: void var_set_alignment (struct variable *VAR, enum alignment ALIGNMENT) Gets or sets VAR's alignment. 2.5.8 Variable Leave Status --------------------------- Commonly, most or all data in a case come from an input file, read with a command such as DATA LIST or GET, but data can also be generated with transformations such as COMPUTE. In the latter case the question of a datum's "initial value" can arise. For example, the value of a piece of generated data can recursively depend on its own value: COMPUTE X = X + 1. Another situation where the initial value of a variable arises is when its value is not set at all for some cases, e.g. below, `Y' is set only for the first 10 cases: DO IF #CASENUM <= 10. + COMPUTE Y = 1. END IF. By default, the initial value of a datum in either of these situations is the system-missing value for numeric values and spaces for string values. This means that, above, X would be system-missing and that Y would be 1 for the first 10 cases and system-missing for the remainder. PSPP also supports retaining the value of a variable from one case to another, using the LEAVE command (*note LEAVE: (pspp)LEAVE.). The initial value of such a variable is 0 if it is numeric and spaces if it is a string. If the command `LEAVE X Y' is appended to the above example, then X would have value 1 in the first case and increase by 1 in every succeeding case, and Y would have value 1 for the first 10 cases and 0 for later cases. The LEAVE command has no effect on data that comes from an input file or whose values do not depend on a variable's initial value. The value of scratch variables (*note Scratch Variables: (pspp)Scratch Variables.) are always left from one case to another. The following functions work with a variable's leave status. -- Function: bool var_get_leave (const struct variable *VAR) Returns true if VAR's value is to be retained from case to case, false if it is reinitialized to system-missing or spaces. -- Function: void var_set_leave (struct variable *VAR, bool LEAVE) If LEAVE is true, marks VAR to be left from case to case; if LEAVE is false, marks VAR to be reinitialized for each case. If VAR is a scratch variable, LEAVE must be true. -- Function: bool var_must_leave (const struct variable *VAR) Returns true if VAR must be left from case to case, that is, if VAR is a scratch variable. 2.5.9 Dictionary Class ---------------------- Occasionally it is useful to classify variables into "dictionary classes" based on their names. Dictionary classes are represented by `enum dict_class'. This type and other declarations for dictionary classes are in the `' header. -- Enumeration: enum dict_class The dictionary classes are: `DC_ORDINARY' An ordinary variable, one whose name does not begin with `$' or `#'. `DC_SYSTEM' A system variable, one whose name begins with `$'. *Note System Variables: (pspp)System Variables. `DC_SCRATCH' A scratch variable, one whose name begins with `#'. *Note Scratch Variables: (pspp)Scratch Variables. The values for dictionary classes are bitwise disjoint, which allows them to be used in bit-masks. An extra enumeration constant `DC_ALL', whose value is the bitwise-or of all of the above constants, is provided to aid in this purpose. One example use of dictionary classes arises in connection with PSPP syntax that uses `A TO B' to name the variables in a dictionary from A to B (*note Sets of Variables: (pspp)Sets of Variables.). This syntax requires A and B to be in the same dictionary class. It limits the variables that it includes to those in that dictionary class. The following functions relate to dictionary classes. -- Function: enum dict_class dict_class_from_id (const char *NAME) Returns the "dictionary class" for the given variable NAME, by looking at its first letter. -- Function: const char * dict_class_to_name (enum dict_class DICT_CLASS) Returns a name for the given DICT_CLASS as an adjective, e.g. `"scratch"'. This function should probably not be used in new code as it can lead to difficulties for internationalization. 2.5.10 Variable Creation and Destruction ---------------------------------------- Only rarely should PSPP code create or destroy variables directly. Ordinarily, variables are created within a dictionary and destroying by individual deletion from the dictionary or by destroying the entire dictionary at once. The functions here enable the exceptional case, of creation and destruction of variables that are not associated with any dictionary. These functions are used internally in the dictionary implementation. -- Function: struct variable * var_create (const char *NAME, int WIDTH) Creates and returns a new variable with the given NAME and WIDTH. The new variable is not part of any dictionary. Use `dict_create_var', instead, to create a variable in a dictionary (*note Dictionary Creating Variables::). NAME should be a valid variable name and must be a "plausible" variable name (*note Variable Name::). WIDTH must be between 0 and `MAX_STRING', inclusive (*note Values::). The new variable has no user-missing values, value labels, or variable label. Numeric variables initially have F8.2 print and write formats, right-justified display alignment, and scale level of measurement. String variables are created with A print and write formats, left-justified display alignment, and nominal level of measurement. The initial display width is determined by `var_default_display_width' (*note var_default_display_width::). The new variable initially has no short name (*note Variable Short Names::) and no auxiliary data (*note Variable Auxiliary Data::). -- Function: struct variable * var_clone (const struct variable *OLD_VAR) Creates and returns a new variable with the same attributes as OLD_VAR, with a few exceptions. First, the new variable is not part of any dictionary, regardless of whether OLD_VAR was in a dictionary. Use `dict_clone_var', instead, to add a clone of a variable to a dictionary. Second, the new variable is not given any short name, even if OLD_VAR had a short name. This is because the new variable is likely to be immediately renamed, in which case the short name would be incorrect (*note Variable Short Names::). Finally, OLD_VAR's auxiliary data, if any, is not copied to the new variable (*note Variable Auxiliary Data::). -- Function: void var_destroy (struct variable *VAR) Destroys VAR and frees all associated storage, including its auxiliary data, if any. VAR must not be part of a dictionary. To delete a variable from a dictionary and destroy it, use `dict_delete_var' (*note Dictionary Deleting Variables::). 2.5.11 Variable Short Names --------------------------- PSPP variable names may be up to 64 (`VAR_NAME_LEN') bytes long. The system and portable file formats, however, were designed when variable names were limited to 8 bytes in length. Since then, the system file format has been augmented with an extension record that explains how the 8-byte short names map to full-length names (*note Long Variable Names Record::), but the short names are still present. Thus, the continued presence of the short names is more or less invisible to PSPP users, but every variable in a system file still has a short name that must be unique. PSPP can generate unique short names for variables based on their full names at the time it creates the data file. If all variables' full names are unique in their first 8 bytes, then the short names are simply prefixes of the full names; otherwise, PSPP changes them so that they are unique. By itself this algorithm interoperates well with other software that can read system files, as long as that software understands the extension record that maps short names to long names. When the other software does not understand the extension record, it can produce surprising results. Consider a situation where PSPP reads a system file that contains two variables named RANKINGSCORE, then the user adds a new variable named RANKINGSTATUS, then saves the modified data as a new system file. A program that does not understand long names would then see one of these variables under the name RANKINGS--either one, depending on the algorithm's details--and the other under a different name. The effect could be very confusing: by adding a new and apparently unrelated variable in PSPP, the user effectively renamed the existing variable. To counteract this potential problem, every `struct variable' may have a short name. A variable created by the system or portable file reader receives the short name from that data file. When a variable with a short name is written to a system or portable file, that variable receives priority over other long names whose names begin with the same 8 bytes but which were not read from a data file under that short name. Variables not created by the system or portable file reader have no short name by default. A variable with a full name of 8 bytes or less in length has absolute priority for that name when the variable is written to a system file, even over a second variable with that assigned short name. PSPP does not enforce uniqueness of short names, although the short names read from any given data file will always be unique. If two variables with the same short name are written to a single data file, neither one receives priority. The following macros and functions relate to short names. -- Macro: SHORT_NAME_LEN Maximum length of a short name, in bytes. Its value is 8. -- Function: const char * var_get_short_name (const struct variable *VAR) Returns VAR's short name, or a null pointer if VAR has not been assigned a short name. -- Function: void var_set_short_name (struct variable *VAR, const char *SHORT_NAME) Sets VAR's short name to SHORT_NAME, or removes VAR's short name if SHORT_NAME is a null pointer. If it is non-null, then SHORT_NAME must be a plausible name for a variable (*note var_is_plausible_name::). The name will be truncated to 8 bytes in length and converted to all-uppercase. -- Function: void var_clear_short_name (struct variable *VAR) Removes VAR's short name. 2.5.12 Variable Relationships ----------------------------- Variables have close relationships with dictionaries (*note Dictionaries::) and cases (*note Cases::). A variable is usually a member of some dictionary, and a case is often used to store data for the set of variables in a dictionary. These functions report on these relationships. They may be applied only to variables that are in a dictionary. -- Function: size_t var_get_dict_index (const struct variable *VAR) Returns VAR's index within its dictionary. The first variable in a dictionary has index 0, the next variable index 1, and so on. The dictionary index can be influenced using dictionary functions such as dict_reorder_var (*note dict_reorder_var::). -- Function: size_t var_get_case_index (const struct variable *VAR) Returns VAR's index within a case. The case index is an index into an array of `union value' large enough to contain all the data in the dictionary. The returned case index can be used to access the value of VAR within a case for its dictionary, as in e.g. `case_data_idx (case, var_get_case_index (VAR))', but ordinarily it is more convenient to use the data access functions that do variable-to-index translation internally, as in e.g. `case_data (case, VAR)'. 2.5.13 Variable Auxiliary Data ------------------------------ Each `struct variable' can have a single pointer to auxiliary data of type `void *'. These functions manipulate a variable's auxiliary data. Use of auxiliary data is discouraged because of its lack of flexibility. Only one client can make use of auxiliary data on a given variable at any time, even though many clients could usefully associate data with a variable. To prevent multiple clients from attempting to use a variable's single auxiliary data field at the same time, we adopt the convention that use of auxiliary data in the active file dictionary is restricted to the currently executing command. In particular, transformations must not attach auxiliary data to a variable in the active file in the expectation that it can be used later when the active file is read and the transformation is executed. To help enforce this restriction, auxiliary data is deleted from all variables in the active file dictionary after the execution of each PSPP command. This convention for safe use of auxiliary data applies only to the active file dictionary. Rules for other dictionaries may be established separately. Auxiliary data should be replaced by a more flexible mechanism at some point, but no replacement mechanism has been designed or implemented so far. The following functions work with variable auxiliary data. -- Function: void * var_get_aux (const struct variable *VAR) Returns VAR's auxiliary data, or a null pointer if none has been assigned. -- Function: void * var_attach_aux (const struct variable *VAR, void *AUX, void (*AUX_DTOR) (struct variable *)) Sets VAR's auxiliary data to AUX, which must not be null. VAR must not already have auxiliary data. Before VAR's auxiliary data is cleared by `var_clear_aux', AUX_DTOR, if non-null, will be called with VAR as its argument. It should free any storage associated with AUX, if necessary. `var_dtor_free' may be appropriate for use as AUX_DTOR: -- Function: void var_dtor_free (struct variable *VAR) Frees VAR's auxiliary data by calling `free'. -- Function: void var_clear_aux (struct variable *VAR) Removes auxiliary data, if any, from VAR, first calling the destructor passed to `var_attach_aux', if one was provided. Use `dict_clear_aux' to remove auxiliary data from every variable in a dictionary. -- Function: void * var_detach_aux (struct variable *VAR) Removes auxiliary data, if any, from VAR, and returns it. Returns a null pointer if VAR had no auxiliary data. Any destructor passed to `var_attach_aux' is not called, so the caller is responsible for freeing storage associated with the returned auxiliary data. 2.5.14 Variable Categorical Values ---------------------------------- Some statistical procedures require a list of all the values that a categorical variable takes on. Arranging such a list requires making a pass through the data, so PSPP caches categorical values in `struct variable'. When variable auxiliary data is revamped to support multiple clients as described in the previous section, categorical values are an obvious candidate. The form in which they are currently supported is inelegant. Categorical values are not robust against changes in the data. That is, there is currently no way to detect that a transformation has changed data values, meaning that categorical values lists for the changed variables must be recomputed. PSPP is in fact in need of a general-purpose caching and cache-invalidation mechanism, but none has yet been designed and built. The following functions work with cached categorical values. -- Function: struct cat_vals * var_get_obs_vals (const struct variable *VAR) Returns VAR's set of categorical values. Yields undefined behavior if VAR does not have any categorical values. -- Function: void var_set_obs_vals (const struct variable *VAR, struct cat_vals *CAT_VALS) Destroys VAR's categorical values, if any, and replaces them by CAT_VALS, ownership of which is transferred to VAR. If CAT_VALS is a null pointer, then VAR's categorical values are cleared. -- Function: bool var_has_obs_vals (const struct variable *VAR) Returns true if VAR has a set of categorical values, false otherwise. 2.6 Dictionaries ================ Each data file in memory or on disk has an associated dictionary, whose primary purpose is to describe the data in the file. *Note Variables: (pspp)Variables, for a PSPP user's view of a dictionary. A data file stored in a PSPP format, either as a system or portable file, has a representation of its dictionary embedded in it. Other kinds of data files are usually not self-describing enough to construct a dictionary unassisted, so the dictionaries for these files must be specified explicitly with PSPP commands such as DATA LIST. The most important content of a dictionary is an array of variables, which must have unique names. A dictionary also conceptually contains a mapping from each of its variables to a location within a case (*note Cases::), although in fact these mappings are stored within individual variables. System variables are not members of any dictionary (*note System Variables: (pspp)System Variables.). Dictionaries are represented by `struct dictionary'. Declarations related to dictionaries are in the `' header. The following sections describe functions for use with dictionaries. 2.6.1 Accessing Variables ------------------------- The most common operations on a dictionary simply retrieve a `struct variable *' of an individual variable based on its name or position. -- Function: struct variable * dict_lookup_var (const struct dictionary *DICT, const char *NAME) -- Function: struct variable * dict_lookup_var_assert (const struct dictionary *DICT, const char *NAME) Looks up and returns the variable with the given NAME within DICT. Name lookup is not case-sensitive. `dict_lookup_var' returns a null pointer if DICT does not contain a variable named NAME. `dict_lookup_var_assert' asserts that such a variable exists. -- Function: struct variable * dict_get_var (const struct dictionary *DICT, size_t POSITION) Returns the variable at the given POSITION in DICT. POSITION must be less than the number of variables in DICT (see below). -- Function: size_t dict_get_var_cnt (const struct dictionary *DICT) Returns the number of variables in DICT. Another pair of functions allows retrieving a number of variables at once. These functions are more rarely useful. -- Function: void dict_get_vars (const struct dictionary *DICT, const struct variable ***VARS, size_t *CNT, enum dict_class EXCLUDE) -- Function: void dict_get_vars_mutable (const struct dictionary *DICT, struct variable ***VARS, size_t *CNT, enum dict_class EXCLUDE) Retrieves all of the variables in DICT, in their original order, except that any variables in the dictionary classes specified EXCLUDE, if any, are excluded (*note Dictionary Class::). Pointers to the variables are stored in an array allocated with `malloc', and a pointer to the first element of this array is stored in `*VARS'. The caller is responsible for freeing this memory when it is no longer needed. The number of variables retrieved is stored in `*CNT'. The presence or absence of `DC_SYSTEM' in EXCLUDE has no effect, because dictionaries never include system variables. One additional function is available. This function is most often used in assertions, but it is not restricted to such use. -- Function: bool dict_contains_var (const struct dictionary *DICT, const struct variable *VAR) Tests whether VAR is one of the variables in DICT. Returns true if so, false otherwise. 2.6.2 Creating Variables ------------------------ These functions create a new variable and insert it into a dictionary in a single step. There is no provision for inserting an already created variable into a dictionary. There is no reason that such a function could not be written, but so far there has been no need for one. The names provided to one of these functions should be valid variable names and must be plausible variable names. If a variable with the same name already exists in the dictionary, the non-`assert' variants of these functions return a null pointer, without modifying the dictionary. The `assert' variants, on the other hand, assert that no duplicate name exists. A variable may be in only one dictionary at any given time. -- Function: struct variable * dict_create_var (struct dictionary *DICT, const char *NAME, int WIDTH) -- Function: struct variable * dict_create_var_assert (struct dictionary *DICT, const char *NAME, int WIDTH) Creates a new variable with the given NAME and WIDTH, as if through a call to `var_create' with those arguments (*note var_create::), appends the new variable to DICT's array of variables, and returns the new variable. -- Function: struct variable * dict_clone_var (struct dictionary *DICT, const struct variable *OLD_VAR, const char *NAME) -- Function: struct variable * dict_clone_var_assert (struct dictionary *DICT, const struct variable *OLD_VAR, const char *NAME) Creates a new variable as a clone of VAR, inserts the new variable into DICT, and returns the new variable. The new variable is named NAME. Other properties of the new variable are copied from OLD_VAR, except for those not copied by `var_clone' (*note var_clone::). VAR does not need to be a member of any dictionary. 2.6.3 Deleting Variables ------------------------ These functions remove variables from a dictionary's array of variables. They also destroy the removed variables and free their associated storage. Deleting a variable to which there might be external pointers is a bad idea. In particular, deleting variables from the active file dictionary is a risky proposition, because transformations can retain references to arbitrary variables. Therefore, no variable should be deleted from the active file dictionary when any transformations are active, because those transformations might reference the variable to be deleted. The safest time to delete a variable is just after a procedure has been executed, as done by DELETE VARIABLES. Deleting a variable automatically removes references to that variable from elsewhere in the dictionary as a weighting variable, filter variable, SPLIT FILE variable, or member of a vector. No functions are provided for removing a variable from a dictionary without destroying that variable. As with insertion of an existing variable, there is no reason that this could not be implemented, but so far there has been no need. -- Function: void dict_delete_var (struct dictionary *DICT, struct variable *VAR) Deletes VAR from DICT, of which it must be a member. -- Function: void dict_delete_vars (struct dictionary *DICT, struct variable *const *VARS, size_t COUNT) Deletes the COUNT variables in array VARS from DICT. All of the variables in VARS must be members of DICT. No variable may be included in VARS more than once. -- Function: void dict_delete_consecutive_vars (struct dictionary *DICT, size_t IDX, size_t COUNT) Deletes the variables in sequential positions IDX...IDX + COUNT (exclusive) from DICT, which must contain at least IDX + COUNT variables. -- Function: void dict_delete_scratch_vars (struct dictionary *DICT) Deletes all scratch variables from DICT. 2.6.4 Changing Variable Order ----------------------------- The variables in a dictionary are stored in an array. These functions change the order of a dictionary's array of variables without changing which variables are in the dictionary. -- Function: void dict_reorder_var (struct dictionary *DICT, struct variable *VAR, size_t NEW_INDEX) Moves VAR, which must be in DICT, so that it is at position NEW_INDEX in DICT's array of variables. Other variables in DICT, if any, retain their relative positions. NEW_INDEX must be less than the number of variables in DICT. -- Function: void dict_reorder_vars (struct dictionary *DICT, struct variable *const *NEW_ORDER, size_t COUNT) Moves the COUNT variables in NEW_ORDER to the beginning of DICT's array of variables in the specified order. Other variables in DICT, if any, retain their relative positions. All of the variables in NEW_ORDER must be in DICT. No duplicates are allowed within NEW_ORDER, which means that COUNT must be no greater than the number of variables in DICT. 2.6.5 Renaming Variables ------------------------ These functions change the names of variables within a dictionary. The `var_set_name' function (*note var_set_name::) cannot be applied directly to a variable that is in a dictionary, because `struct dictionary' contains an index by name that `var_set_name' would not update. The following functions take care to update the index as well. They also ensure that variable renaming does not cause a dictionary to contain a duplicate variable name. -- Function: void dict_rename_var (struct dictionary *DICT, struct variable *VAR, const char *NEW_NAME) Changes the name of VAR, which must be in DICT, to NEW_NAME. A variable named NEW_NAME must not already be in DICT, unless NEW_NAME is the same as VAR's current name. -- Function: bool dict_rename_vars (struct dictionary *DICT, struct variable **VARS, char **NEW_NAMES, size_t COUNT, char **ERR_NAME) Renames each of the COUNT variables in VARS to the name in the corresponding position of NEW_NAMES. If the renaming would result in a duplicate variable name, returns false and stores one of the names that would be be duplicated into `*ERR_NAME', if ERR_NAME is non-null. Otherwise, the renaming is successful, and true is returned. 2.6.6 Weight Variable --------------------- A data set's cases may optionally be weighted by the value of a numeric variable. *Note WEIGHT: (pspp)WEIGHT, for a user view of weight variables. The weight variable is written to and read from system and portable files. The most commonly useful function related to weighting is a convenience function to retrieve a weighting value from a case. -- Function: double dict_get_case_weight (const struct dictionary *DICT, const struct ccase *CASE, bool *WARN_ON_INVALID) Retrieves and returns the value of the weighting variable specified by DICT from CASE. Returns 1.0 if DICT has no weighting variable. Returns 0.0 if C's weight value is user- or system-missing, zero, or negative. In such a case, if WARN_ON_INVALID is non-null and `*WARN_ON_INVALID' is true, `dict_get_case_weight' also issues an error message and sets `*WARN_ON_INVALID' to false. To disable error reporting, pass a null pointer or a pointer to false as WARN_ON_INVALID or use a `msg_disable'/`msg_enable' pair. The dictionary also has a pair of functions for getting and setting the weight variable. -- Function: struct variable * dict_get_weight (const struct dictionary *DICT) Returns DICT's current weighting variable, or a null pointer if the dictionary does not have a weighting variable. -- Function: void dict_set_weight (struct dictionary *DICT, struct variable *VAR) Sets DICT's weighting variable to VAR. If VAR is non-null, it must be a numeric variable in DICT. If VAR is null, then DICT's weighting variable, if any, is cleared. 2.6.7 Filter Variable --------------------- When the active file is read by a procedure, cases can be excluded from analysis based on the values of a "filter variable". *Note FILTER: (pspp)FILTER, for a user view of filtering. These functions store and retrieve the filter variable. They are rarely useful, because the data analysis framework automatically excludes from analysis the cases that should be filtered. -- Function: struct variable * dict_get_filter (const struct dictionary *DICT) Returns DICT's current filter variable, or a null pointer if the dictionary does not have a filter variable. -- Function: void dict_set_filter (struct dictionary *DICT, struct variable *VAR) Sets DICT's filter variable to VAR. If VAR is non-null, it must be a numeric variable in DICT. If VAR is null, then DICT's filter variable, if any, is cleared. 2.6.8 Case Limit ---------------- The limit on cases analyzed by a procedure, set by the N OF CASES command (*note N OF CASES: (pspp)N OF CASES.), is stored as part of the dictionary. The dictionary does not, on the other hand, play any role in enforcing the case limit (a job done by data analysis framework code). A case limit of 0 means that the number of cases is not limited. These functions are rarely useful, because the data analysis framework automatically excludes from analysis any cases beyond the limit. -- Function: casenumber dict_get_case_limit (const struct dictionary *DICT) Returns the current case limit for DICT. -- Function: void dict_set_case_limit (struct dictionary *DICT, casenumber LIMIT) Sets DICT's case limit to LIMIT. 2.6.9 Split Variables --------------------- The user may use the SPLIT FILE command (*note SPLIT FILE: (pspp)SPLIT FILE.) to select a set of variables on which to split the active file into groups of cases to be analyzed independently in each statistical procedure. The set of split variables is stored as part of the dictionary, although the effect on data analysis is implemented by each individual statistical procedure. Split variables may be numeric or short or long string variables. The most useful functions for split variables are those to retrieve them. Even these functions are rarely useful directly: for the purpose of breaking cases into groups based on the values of the split variables, it is usually easier to use `casegrouper_create_splits'. -- Function: const struct variable *const * dict_get_split_vars (const struct dictionary *DICT) Returns a pointer to an array of pointers to split variables. If and only if there are no split variables, returns a null pointer. The caller must not modify or free the returned array. -- Function: size_t dict_get_split_cnt (const struct dictionary *DICT) Returns the number of split variables. The following functions are also available for working with split variables. -- Function: void dict_set_split_vars (struct dictionary *DICT, struct variable *const *VARS, size_t CNT) Sets DICT's split variables to the CNT variables in VARS. If CNT is 0, then DICT will not have any split variables. The caller retains ownership of VARS. -- Function: void dict_unset_split_var (struct dictionary *DICT, struct variable *VAR) Removes VAR, which must be a variable in DICT, from DICT's split of split variables. 2.6.10 File Label ----------------- A dictionary may optionally have an associated string that describes its contents, called its file label. The user may set the file label with the FILE LABEL command (*note FILE LABEL: (pspp)FILE LABEL.). These functions set and retrieve the file label. -- Function: const char * dict_get_label (const struct dictionary *DICT) Returns DICT's file label. If DICT does not have a label, returns a null pointer. -- Function: void dict_set_label (struct dictionary *DICT, const char *LABEL) Sets DICT's label to LABEL. If LABEL is non-null, then its content, truncated to at most 60 bytes, becomes the new file label. If LABEL is null, then DICT's label is removed. The caller retains ownership of LABEL. 2.6.11 Documents ---------------- A dictionary may include an arbitrary number of lines of explanatory text, called the dictionary's documents. For compatibility, document lines have a fixed width, and lines that are not exactly this width are truncated or padded with spaces as necessary to bring them to the correct width. PSPP users can use the DOCUMENT (*note DOCUMENT: (pspp)DOCUMENT.), ADD DOCUMENT (*note ADD DOCUMENT: (pspp)ADD DOCUMENT.), and DROP DOCUMENTS (*note DROP DOCUMENTS: (pspp)DROP DOCUMENTS.) commands to manipulate documents. -- Macro: int DOC_LINE_LENGTH The fixed length of a document line, in bytes, defined to 80. The following functions work with whole sets of documents. They accept or return sets of documents formatted as null-terminated strings that are an exact multiple of `DOC_LINE_LENGTH' bytes in length. -- Function: const char * dict_get_documents (const struct dictionary *DICT) Returns the documents in DICT, or a null pointer if DICT has no documents. -- Function: void dict_set_documents (struct dictionary *DICT, const char *NEW_DOCUMENTS) Sets DICT's documents to NEW_DOCUMENTS. If NEW_DOCUMENTS is a null pointer or an empty string, then DICT's documents are cleared. The caller retains ownership of NEW_DOCUMENTS. -- Function: void dict_clear_documents (struct dictionary *DICT) Clears the documents from DICT. The following functions work with individual lines in a dictionary's set of documents. -- Function: void dict_add_document_line (struct dictionary *DICT, const char *CONTENT) Appends CONTENT to the documents in DICT. The text in CONTENT will be truncated or padded with spaces as necessary to make it exactly `DOC_LINE_LENGTH' bytes long. The caller retains ownership of CONTENT. If CONTENT is over `DOC_LINE_LENGTH', this function also issues a warning using `msg'. To suppress the warning, enclose a call to one of this function in a `msg_disable'/`msg_enable' pair. -- Function: size_t dict_get_document_line_cnt (const struct dictionary *DICT) Returns the number of line of documents in DICT. If the dictionary contains no documents, returns 0. -- Function: void dict_get_document_line (const struct dictionary *DICT, size_t IDX, struct string *CONTENT) Replaces the text in CONTENT (which must already have been initialized by the caller) by the document line in DICT numbered IDX, which must be less than the number of lines of documents in DICT. Any trailing white space in the document line is trimmed, so that CONTENT will have a length between 0 and `DOC_LINE_LENGTH'. 2.7 Coding Conventions ====================== Every `.c' file should have `#include ' as its first non-comment line. No `.h' file should include `config.h'. This section needs to be finished. 2.8 Cases ========= This section needs to be written. 2.9 Data Sets ============= This section needs to be written. 2.10 Pools ========== This section needs to be written. 3 Parsing Command Syntax ************************ 4 Processing Data ***************** Developer's Guide Proposed outline: * Introduction * Basic concepts ** Data sets ** Variables ** Dictionaries ** Coding conventions ** Pools * Syntax parsing * Data processing ** Reading data *** Casereaders generalities *** Casereaders from data files *** Casereaders from the active file *** Other casereaders ** Writing data *** Casewriters generally *** Casewriters to data files *** Modifying the active file **** Modifying cases obtained from active file casereaders has no real effect **** Transformations; procedures that transform ** Transforming data *** Sorting and merging *** Filtering *** Grouping **** Ordering and interaction of filtering and grouping *** Multiple passes over data *** Counting cases and case weights ** Best practices *** Multiple passes with filters versus single pass with loops *** Sequential versus random access *** Managing memory *** Passing cases around *** Renaming casereaders *** Avoiding excessive buffering *** Propagating errors *** Avoid static/global data *** Don't worry about null filters, groups, etc. *** Be aware of reference counting semantics for cases 5 Presenting Output ******************* 6 Function Index **************** *fmt_create: See 2.2.4. (line 623) *fmt_dollar_template: See 2.2.3. (line 571) *fmt_to_string: See 2.2.2. (line 407) alignment: See 2.5.7. (line 1529) alignment_is_valid: See 2.5.7. (line 1523) char: See 2.2.3. (line 437) data_in: See 2.2.5. (line 663) data_out: See 2.2.5. (line 703) data_out_legacy: See 2.2.5. (line 706) dict_add_document_line: See 2.6.11. (line 2292) dict_class_from_id: See 2.5.9. (line 1622) dict_class_to_name: See 2.5.9. (line 1627) dict_clear_documents: See 2.6.11. (line 2285) dict_clone_var: See 2.6.2. (line 1999) dict_clone_var_assert: See 2.6.2. (line 2002) dict_contains_var: See 2.6.1. (line 1966) dict_create_var: See 2.6.2. (line 1990) dict_create_var_assert: See 2.6.2. (line 1992) dict_delete_consecutive_vars: See 2.6.3. (line 2047) dict_delete_scratch_vars: See 2.6.3. (line 2052) dict_delete_var: See 2.6.3. (line 2037) dict_delete_vars: See 2.6.3. (line 2041) dict_get_case_limit: See 2.6.8. (line 2184) dict_get_case_weight: See 2.6.6. (line 2120) dict_get_document_line: See 2.6.11. (line 2308) dict_get_document_line_cnt: See 2.6.11. (line 2303) dict_get_documents: See 2.6.11. (line 2275) dict_get_filter: See 2.6.7. (line 2158) dict_get_label: See 2.6.10. (line 2241) dict_get_split_cnt: See 2.6.9. (line 2214) dict_get_split_vars: See 2.6.9. (line 2209) dict_get_var: See 2.6.1. (line 1935) dict_get_var_cnt: See 2.6.1. (line 1939) dict_get_vars: See 2.6.1. (line 1946) dict_get_vars_mutable: See 2.6.1. (line 1949) dict_get_weight: See 2.6.6. (line 2136) dict_lookup_var: See 2.6.1. (line 1924) dict_lookup_var_assert: See 2.6.1. (line 1926) dict_rename_var: See 2.6.5. (line 2091) dict_rename_vars: See 2.6.5. (line 2098) dict_reorder_var: See 2.6.4. (line 2063) dict_reorder_vars: See 2.6.4. (line 2070) dict_set_case_limit: See 2.6.8. (line 2188) dict_set_documents: See 2.6.11. (line 2280) dict_set_filter: See 2.6.7. (line 2163) dict_set_label: See 2.6.10. (line 2246) dict_set_split_vars: See 2.6.9. (line 2221) dict_set_weight: See 2.6.6. (line 2141) dict_unset_split_var: See 2.6.9. (line 2227) DOC_LINE_LENGTH: See 2.6.11. (line 2267) fmt_affix_width: See 2.2.4. (line 632) fmt_category: See 2.2.3. (line 486) fmt_check: See 2.2.1. (line 374) fmt_check_input: See 2.2.1. (line 375) fmt_check_output: See 2.2.1. (line 376) fmt_check_style: See 2.2.4. (line 648) fmt_check_type_compat: See 2.2.1. (line 382) fmt_check_width_compat: See 2.2.1. (line 388) fmt_default_for_width: See 2.2.1. (line 362) fmt_done: See 2.2.4. (line 627) fmt_equal: See 2.2.2. (line 417) fmt_for_input: See 2.2.1. (line 348) fmt_for_output: See 2.2.1. (line 350) fmt_for_output_from_input: See 2.2.1. (line 356) fmt_from_io: See 2.2.3. (line 540) fmt_from_name: See 2.2.3. (line 441) fmt_get_style: See 2.2.4. (line 644) fmt_is_numeric: See 2.2.3. (line 482) fmt_is_string: See 2.2.3. (line 481) fmt_max_input_decimals: See 2.2.3. (line 462) fmt_max_input_width: See 2.2.3. (line 455) fmt_max_output_decimals: See 2.2.3. (line 464) fmt_max_output_width: See 2.2.3. (line 457) fmt_min_input_width: See 2.2.3. (line 454) fmt_min_output_width: See 2.2.3. (line 456) fmt_name: See 2.2.4. (line 651) fmt_neg_affix_width: See 2.2.4. (line 636) fmt_number_style_destroy: See 2.2.4. (line 620) fmt_number_style_init: See 2.2.4. (line 614) fmt_resize: See 2.2.2. (line 422) fmt_step_width: See 2.2.3. (line 470) fmt_takes_decimals: See 2.2.3. (line 449) fmt_to_io: See 2.2.3. (line 536) fmt_type: See 2.2.3. (line 548) fmt_usable_for_input: See 2.2.3. (line 556) fmt_var_width: See 2.2.2. (line 400) HIGHEST: See 2.1.1. (line 198) LOWEST: See 2.1.1. (line 197) MAX_SHORT_STRING: See 2.1.3. (line 242) MAX_STRING: See 2.1. (line 161) measure: See 2.5.7. (line 1494) measure_is_valid: See 2.5.7. (line 1489) MIN_LONG_STRING: See 2.1.3. (line 247) mv_add_num: See 2.3.5. (line 932) mv_add_range: See 2.3.5. (line 956) mv_add_str: See 2.3.5. (line 931) mv_add_value: See 2.3.5. (line 929) mv_clear: See 2.3.2. (line 839) mv_copy: See 2.3.2. (line 835) mv_get_range: See 2.3.4. (line 916) mv_get_value: See 2.3.4. (line 904) mv_get_width: See 2.3.4. (line 890) mv_has_range: See 2.3.4. (line 912) mv_has_value: See 2.3.4. (line 899) mv_init: See 2.3.2. (line 829) mv_is_empty: See 2.3.4. (line 885) mv_is_num_missing: See 2.3.1. (line 792) mv_is_resizable: See 2.3.3. (line 859) mv_is_str_missing: See 2.3.1. (line 794) mv_is_value_missing: See 2.3.1. (line 790) mv_n_values: See 2.3.4. (line 893) mv_pop_range: See 2.3.5. (line 964) mv_pop_value: See 2.3.5. (line 943) mv_replace_value: See 2.3.5. (line 948) mv_resize: See 2.3.3. (line 868) SHORT_NAME_LEN: See 2.5.11. (line 1743) SYSMIS: See 2.1.1. (line 183) val_labs_add: See 2.4.3. (line 1065) val_labs_can_set_width: See 2.4.2. (line 1039) val_labs_clear: See 2.4.1. (line 1023) val_labs_clone: See 2.4.1. (line 1019) val_labs_count: See 2.4.2. (line 1035) val_labs_create: See 2.4.1. (line 1014) val_labs_destroy: See 2.4.1. (line 1026) val_labs_done: See 2.4.4. (line 1146) val_labs_find: See 2.4. (line 1002) val_labs_first: See 2.4.4. (line 1124) val_labs_first_sorted: See 2.4.4. (line 1135) val_labs_next: See 2.4.4. (line 1140) val_labs_remove: See 2.4.3. (line 1079) val_labs_replace: See 2.4.3. (line 1072) val_labs_set_width: See 2.4.2. (line 1049) val_type_from_width: See 2.1. (line 168) val_type_is_valid: See 2.1. (line 164) value_cnt_from_width: See 2.1.3. (line 264) value_copy: See 2.1.3. (line 271) value_is_resizable: See 2.1.3. (line 282) value_resize: See 2.1.3. (line 294) value_set_missing: See 2.1.3. (line 276) var_add_value_label: See 2.5.4. (line 1364) var_append_value_name: See 2.5.4. (line 1327) var_attach_aux: See 2.5.13. (line 1827) var_clear_aux: See 2.5.13. (line 1839) var_clear_label: See 2.5.6. (line 1451) var_clear_missing_values: See 2.5.3. (line 1305) var_clear_short_name: See 2.5.11. (line 1759) var_clear_value_labels: See 2.5.4. (line 1354) var_clone: See 2.5.10. (line 1667) var_create: See 2.5.10. (line 1645) var_default_display_width: See 2.5.7. (line 1510) var_destroy: See 2.5.10. (line 1682) var_detach_aux: See 2.5.13. (line 1846) var_get_aux: See 2.5.13. (line 1822) var_get_case_index: See 2.5.12. (line 1780) var_get_dict_class: See 2.5.1. (line 1219) var_get_dict_index: See 2.5.12. (line 1773) var_get_display_width: See 2.5.7. (line 1505) var_get_label: See 2.5.6. (line 1438) var_get_leave: See 2.5.8. (line 1572) var_get_missing_values: See 2.5.3. (line 1291) var_get_name: See 2.5.1. (line 1186) var_get_obs_vals: See 2.5.14. (line 1876) var_get_print_format: See 2.5.5. (line 1406) var_get_short_name: See 2.5.11. (line 1747) var_get_type: See 2.5.2. (line 1229) var_get_value_cnt: See 2.5.2. (line 1256) var_get_value_labels: See 2.5.4. (line 1340) var_get_width: See 2.5.2. (line 1232) var_get_write_format: See 2.5.5. (line 1408) var_has_label: See 2.5.6. (line 1454) var_has_missing_values: See 2.5.3. (line 1309) var_has_obs_vals: See 2.5.14. (line 1886) var_has_value_labels: See 2.5.4. (line 1336) var_is_alpha: See 2.5.2. (line 1244) var_is_long_string: See 2.5.2. (line 1252) var_is_num_missing: See 2.5.3. (line 1274) var_is_numeric: See 2.5.2. (line 1241) var_is_plausible_name: See 2.5.1. (line 1204) var_is_short_string: See 2.5.2. (line 1248) var_is_str_missing: See 2.5.3. (line 1276) var_is_valid_name: See 2.5.1. (line 1202) var_is_value_missing: See 2.5.3. (line 1272) var_lookup_value_label: See 2.5.4. (line 1322) var_must_leave: See 2.5.8. (line 1582) VAR_NAME_LEN: See 2.5.1. (line 1181) var_replace_value_label: See 2.5.4. (line 1372) var_set_alignment: See 2.5.7. (line 1531) var_set_both_formats: See 2.5.5. (line 1416) var_set_display_width: See 2.5.7. (line 1507) var_set_label: See 2.5.6. (line 1445) var_set_leave: See 2.5.8. (line 1576) var_set_measure: See 2.5.7. (line 1496) var_set_missing_values: See 2.5.3. (line 1297) var_set_name: See 2.5.1. (line 1193) var_set_obs_vals: See 2.5.14. (line 1881) var_set_print_format: See 2.5.5. (line 1412) var_set_short_name: See 2.5.11. (line 1752) var_set_value_labels: See 2.5.4. (line 1349) var_set_width: See 2.5.2. (line 1235) var_set_write_format: See 2.5.5. (line 1414) var_to_string: See 2.5.6. (line 1431) void: See 2.5.13. (line 1836) 7 Concept Index *************** FDL, GNU Free Documentation License: See Appendix D. (line 4008) long string: See 2.1.3. (line 235) MAX_SHORT_STRING: See 2.1.3. (line 235) MAX_STRING <1>: See 2.1.2. (line 220) MAX_STRING: See 2.1. (line 151) numeric value: See 2.1. (line 151) short string: See 2.1.3. (line 235) string value <1>: See 2.1.3. (line 235) string value: See 2.1. (line 151) value: See 2.1. (line 149) width: See 2.1. (line 151) Appendix A Portable File Format ******************************* These days, most computers use the same internal data formats for integer and floating-point data, if one ignores little differences like big- versus little-endian byte ordering. However, occasionally it is necessary to exchange data between systems with incompatible data formats. This is what portable files are designed to do. *Please note:* This information is gleaned from examination of ASCII-formatted portable files only, so some of it may be incorrect for portable files formatted in EBCDIC or other character sets. A.1 Portable File Characters ============================ Portable files are arranged as a series of lines of 80 characters each. Each line is terminated by a carriage-return, line-feed sequence ("new-lines"). New-lines are only used to avoid line length limits imposed by some OSes; they are not meaningful. Most lines in portable files are exactly 80 characters long. The only exception is a line that ends in one or more spaces, in which the spaces may optionally be omitted. Thus, a portable file reader must act as though a line shorter than 80 characters is padded to that length with spaces. The file must be terminated with a `Z' character. In addition, if the final line in the file does not have exactly 80 characters, then it is padded on the right with `Z' characters. (The file contents may be in any character set; the file contains a description of its own character set, as explained in the next section. Therefore, the `Z' character is not necessarily an ASCII `Z'.) For the rest of the description of the portable file format, new-lines and the trailing `Z's will be ignored, as if they did not exist, because they are not an important part of understanding the file contents. A.2 Portable File Structure =========================== Every portable file consists of the following records, in sequence: * File header. * Version and date info. * Product identification. * Author identification (optional). * Subproduct identification (optional). * Variable count. * Case weight variable (optional). * Variables. Each variable record may optionally be followed by a missing value record and a variable label record. * Value labels (optional). * Documents (optional). * Data. Most records are identified by a single-character tag code. The file header and version info record do not have a tag. Other than these single-character codes, there are three types of fields in a portable file: floating-point, integer, and string. Floating-point fields have the following format: * Zero or more leading spaces. * Optional asterisk (`*'), which indicates a missing value. The asterisk must be followed by a single character, generally a period (`.'), but it appears that other characters may also be possible. This completes the specification of a missing value. * Optional minus sign (`-') to indicate a negative number. * A whole number, consisting of one or more base-30 digits: `0' through `9' plus capital letters `A' through `T'. * Optional fraction, consisting of a radix point (`.') followed by one or more base-30 digits. * Optional exponent, consisting of a plus or minus sign (`+' or `-') followed by one or more base-30 digits. * A forward slash (`/'). Integer fields take a form identical to floating-point fields, but they may not contain a fraction. String fields take the form of a integer field having value N, followed by exactly N characters, which are the string content. A.3 Portable File Header ======================== Every portable file begins with a 464-byte header, consisting of a 200-byte collection of vanity splash strings, followed by a 256-byte character set translation table, followed by an 8-byte tag string. The 200-byte segment is divided into five 40-byte sections, each of which represents the string `CHARSET SPSS PORT FILE' in a different character set encoding, where CHARSET is the name of the character set used in the file, e.g. `ASCII' or `EBCDIC'. Each string is padded on the right with spaces in its respective character set. It appears that these strings exist only to inform those who might view the file on a screen, and that they are not parsed by SPSS products. Thus, they can be safely ignored. For those interested, the strings are supposed to be in the following character sets, in the specified order: EBCDIC, 7-bit ASCII, CDC 6-bit ASCII, 6-bit ASCII, Honeywell 6-bit ASCII. The 256-byte segment describes a mapping from the character set used in the portable file to an arbitrary character set having characters at the following positions: 0-60 Control characters. Not important enough to describe in full here. 61-63 Reserved. 64-73 Digits `0' through `9'. 74-99 Capital letters `A' through `Z'. 100-125 Lowercase letters `a' through `z'. 126 Space. 127-130 Symbols `.<(+' 131 Solid vertical pipe. 132-142 Symbols `&[]!$*);^-/' 143 Broken vertical pipe. 144-150 Symbols `,%_>'?``:' 151 British pound symbol. 152-155 Symbols `@'="'. 156 Less than or equal symbol. 157 Empty box. 158 Plus or minus. 159 Filled box. 160 Degree symbol. 161 Dagger. 162 Symbol `~'. 163 En dash. 164 Lower left corner box draw. 165 Upper left corner box draw. 166 Greater than or equal symbol. 167-176 Superscript `0' through `9'. 177 Lower right corner box draw. 178 Upper right corner box draw. 179 Not equal symbol. 180 Em dash. 181 Superscript `('. 182 Superscript `)'. 183 Horizontal dagger (?). 184-186 Symbols `{}\'. 187 Cents symbol. 188 Centered dot, or bullet. 189-255 Reserved. Symbols that are not defined in a particular character set are set to the same value as symbol 64; i.e., to `0'. The 8-byte tag string consists of the exact characters `SPSSPORT' in the portable file's character set, which can be used to verify that the file is indeed a portable file. A.4 Version and Date Info Record ================================ This record does not have a tag code. It has the following structure: * A single character identifying the file format version. The letter A represents version 0, and so on. * An 8-character string field giving the file creation date in the format YYYYMMDD. * A 6-character string field giving the file creation time in the format HHMMSS. A.5 Identification Records ========================== The product identification record has tag code `1'. It consists of a single string field giving the name of the product that wrote the portable file. The author identification record has tag code `2'. It is optional. If present, it consists of a single string field giving the name of the person who caused the portable file to be written. The subproduct identification record has tag code `3'. It is optional. If present, it consists of a single string field giving additional information on the product that wrote the portable file. A.6 Variable Count Record ========================= The variable count record has tag code `4'. It consists of two integer fields. The first contains the number of variables in the file dictionary. The purpose of the second is unknown; it contains the value 161 in all portable files examined so far. A.7 Case Weight Variable Record =============================== The case weight variable record is optional. If it is present, it indicates the variable used for weighting cases; if it is absent, cases are unweighted. It has tag code `6'. It consists of a single string field that names the weighting variable. A.8 Variable Records ==================== Each variable record represents a single variable. Variable records have tag code `7'. They have the following structure: * Width (integer). This is 0 for a numeric variable, and a number between 1 and 255 for a string variable. * Name (string). 1-8 characters long. Must be in all capitals. A few portable files that contain duplicate variable names have been spotted in the wild. PSPP handles these by renaming the duplicates with numeric extensions: `VAR_1', `VAR_2', and so on. * Print format. This is a set of three integer fields: - Format type (*note Variable Record::). - Format width. 1-40. - Number of decimal places. 1-40. A few portable files with invalid format types or formats that are not of the appropriate width for their variables have been spotted in the wild. PSPP assigns a default F or A format to a variable with an invalid format. * Write format. Same structure as the print format described above. Each variable record can optionally be followed by a missing value record, which has tag code `8'. A missing value record has one field, the missing value itself (a floating-point or string, as appropriate). Up to three of these missing value records can be used. There is also a record for missing value ranges, which has tag code `B'. It is followed by two fields representing the range, which are floating-point or string as appropriate. If a missing value range is present, it may be followed by a single missing value record. Tag codes `9' and `A' represent `LO THRU X' and `X THRU HI' ranges, respectively. Each is followed by a single field representing X. If one of the ranges is present, it may be followed by a single missing value record. In addition, each variable record can optionally be followed by a variable label record, which has tag code `C'. A variable label record has one field, the variable label itself (string). A.9 Value Label Records ======================= Value label records have tag code `D'. They have the following format: * Variable count (integer). * List of variables (strings). The variable count specifies the number in the list. Variables are specified by their names. All variables must be of the same type (numeric or string), but string variables do not necessarily have the same width. * Label count (integer). * List of (value, label) tuples. The label count specifies the number of tuples. Each tuple consists of a value, which is numeric or string as appropriate to the variables, followed by a label (string). A few portable files that specify duplicate value labels, that is, two different labels for a single value of a single variable, have been spotted in the wild. PSPP uses the last value label specified in these cases. A.10 Document Record ==================== One document record may optionally follow the value label record. The document record consists of tag code `E', following by the number of document lines as an integer, followed by that number of strings, each of which represents one document line. Document lines must be 80 bytes long or shorter. A.11 Portable File Data ======================= The data record has tag code `F'. There is only one tag for all the data; thus, all the data must follow the dictionary. The data is terminated by the end-of-file marker `Z', which is not valid as the beginning of a data element. Data elements are output in the same order as the variable records describing them. String variables are output as string fields, and numeric variables are output as floating-point fields. Appendix B System File Format ***************************** A system file encapsulates a set of cases and dictionary information that describes how they may be interpreted. This chapter describes the format of a system file. System files use three data types: 8-bit characters, 32-bit integers, and 64-bit floating points, called here `char', `int32', and `flt64', respectively. Data is not necessarily aligned on a word or double-word boundary: the long variable name record (*note Long Variable Names Record::) and very long string records (*note Very Long String Record::) have arbitrary byte length and can therefore cause all data coming after them in the file to be misaligned. Integer data in system files may be big-endian or little-endian. A reader may detect the endianness of a system file by examining `layout_code' in the file header record (*note `layout_code': layout_code.). Floating-point data in system files may nominally be in IEEE 754, IBM, or VAX formats. A reader may detect the floating-point format in use by examining `bias' in the file header record (*note `bias': bias.). PSPP detects big-endian and little-endian integer formats in system files and translates as necessary. PSPP also detects the floating-point format in use, as well as the endianness of IEEE 754 floating-point numbers, and translates as needed. However, only IEEE 754 numbers with the same endianness as integer data in the same file has actually been observed in system files, and it is likely that other formats are obsolete or were never used. The PSPP system-missing value is represented by the largest possible negative number in the floating point format (`-DBL_MAX'). Two other values are important for use as missing values: `HIGHEST', represented by the largest possible positive number (`DBL_MAX'), and `LOWEST', represented by the second-largest negative number (in IEEE 754 format, `0xffeffffffffffffe'). System files are divided into records, each of which begins with a 4-byte record type, usually regarded as an `int32'. The records must appear in the following order: * File header record. * Variable records. * All pairs of value labels records and value label variables records, if present. * Document record, if present. * Any of the following records, if present, in any order: - Machine integer info record. - Machine floating-point info record. - Variable display parameter record. - Long variable names record. - Miscellaneous informational records. * Dictionary termination record. * Data record. Each type of record is described separately below. B.1 File Header Record ====================== The file header is always the first record in the file. It has the following format: char rec_type[4]; char prod_name[60]; int32 layout_code; int32 nominal_case_size; int32 compressed; int32 weight_index; int32 ncases; flt64 bias; char creation_date[9]; char creation_time[8]; char file_label[64]; char padding[3]; `char rec_type[4];' Record type code, set to `$FL2'. `char prod_name[60];' Product identification string. This always begins with the characters `@(#) SPSS DATA FILE'. PSPP uses the remaining characters to give its version and the operating system name; for example, `GNU pspp 0.1.4 - sparc-sun-solaris2.5.2'. The string is truncated if it would be longer than 60 characters; otherwise it is padded on the right with spaces. `int32 layout_code;' Normally set to 2, although a few system files have been spotted in the wild with a value of 3 here. PSPP use this value to determine the file's integer endianness (*note System File Format::). `int32 nominal_case_size;' Number of data elements per case. This is the number of variables, except that long string variables add extra data elements (one for every 8 characters after the first 8). However, string variables do not contribute to this value beyond the first 255 bytes. Further, system files written by some systems set this value to -1. In general, it is unsafe for systems reading system files to rely upon this value. `int32 compressed;' Set to 1 if the data in the file is compressed, 0 otherwise. `int32 weight_index;' If one of the variables in the data set is used as a weighting variable, set to the dictionary index of that variable, plus 1 (*note Dictionary Index::). Otherwise, set to 0. `int32 ncases;' Set to the number of cases in the file if it is known, or -1 otherwise. In the general case it is not possible to determine the number of cases that will be output to a system file at the time that the header is written. The way that this is dealt with is by writing the entire system file, including the header, then seeking back to the beginning of the file and writing just the `ncases' field. For `files' in which this is not valid, the seek operation fails. In this case, `ncases' remains -1. `flt64 bias;' Compression bias, ordinarily set to 100. Only integers between `1 - bias' and `251 - bias' can be compressed. By assuming that its value is 100, PSPP uses `bias' to determine the file's floating-point format and endianness (*note System File Format::). If the compression bias is not 100, PSPP cannot auto-detect the floating-point format and assumes that it is IEEE 754 format with the same endianness as the system file's integers, which is correct for all known system files. `char creation_date[9];' Date of creation of the system file, in `dd mmm yy' format, with the month as standard English abbreviations, using an initial capital letter and following with lowercase. If the date is not available then this field is arbitrarily set to `01 Jan 70'. `char creation_time[8];' Time of creation of the system file, in `hh:mm:ss' format and using 24-hour time. If the time is not available then this field is arbitrarily set to `00:00:00'. `char file_label[64];' File label declared by the user, if any (*note FILE LABEL: (pspp)FILE LABEL.). Padded on the right with spaces. `char padding[3];' Ignored padding bytes to make the structure a multiple of 32 bits in length. Set to zeros. B.2 Variable Record =================== There must be one variable record for each numeric variable and each string variable with width 8 bytes or less. String variables wider than 8 bytes have one variable record for each 8 bytes, rounding up. The first variable record for a long string specifies the variable's correct dictionary information. Subsequent variable records for a long string are filled with dummy information: a type of -1, no variable label or missing values, print and write formats that are ignored, and an empty string as name. A few system files have been encountered that include a variable label on dummy variable records, so readers should take care to parse dummy variable records in the same way as other variable records. The "dictionary index" of a variable is its offset in the set of variable records, including dummy variable records for long string variables. The first variable record has a dictionary index of 0, the second has a dictionary index of 1, and so on. The system file format does not directly support string variables wider than 255 bytes. Such very long string variables are represented by a number of narrower string variables. *Note Very Long String Record::, for details. int32 rec_type; int32 type; int32 has_var_label; int32 n_missing_values; int32 print; int32 write; char name[8]; /* Present only if `has_var_label' is 1. */ int32 label_len; char label[]; /* Present only if `n_missing_values' is nonzero. */ flt64 missing_values[]; `int32 rec_type;' Record type code. Always set to 2. `int32 type;' Variable type code. Set to 0 for a numeric variable. For a short string variable or the first part of a long string variable, this is set to the width of the string. For the second and subsequent parts of a long string variable, set to -1, and the remaining fields in the structure are ignored. `int32 has_var_label;' If this variable has a variable label, set to 1; otherwise, set to 0. `int32 n_missing_values;' If the variable has no missing values, set to 0. If the variable has one, two, or three discrete missing values, set to 1, 2, or 3, respectively. If the variable has a range for missing variables, set to -2; if the variable has a range for missing variables plus a single discrete value, set to -3. `int32 print;' Print format for this variable. See below. `int32 write;' Write format for this variable. See below. `char name[8];' Variable name. The variable name must begin with a capital letter or the at-sign (`@'). Subsequent characters may also be digits, octothorpes (`#'), dollar signs (`$'), underscores (`_'), or full stops (`.'). The variable name is padded on the right with spaces. `int32 label_len;' This field is present only if `has_var_label' is set to 1. It is set to the length, in characters, of the variable label, which must be a number between 0 and 120. `char label[];' This field is present only if `has_var_label' is set to 1. It has length `label_len', rounded up to the nearest multiple of 32 bits. The first `label_len' characters are the variable's variable label. `flt64 missing_values[];' This field is present only if `n_missing_values' is not 0. It has the same number of elements as the absolute value of `n_missing_values'. For discrete missing values, each element represents one missing value. When a range is present, the first element denotes the minimum value in the range, and the second element denotes the maximum value in the range. When a range plus a value are present, the third element denotes the additional discrete missing value. HIGHEST and LOWEST are indicated as described in the chapter introduction. The `print' and `write' members of sysfile_variable are output formats coded into `int32' types. The least-significant byte of the `int32' represents the number of decimal places, and the next two bytes in order of increasing significance represent field width and format type, respectively. The most-significant byte is not used and should be set to zero. Format types are defined as follows: Value Meaning --------------------- 0 Not used. 1 `A' 2 `AHEX' 3 `COMMA' 4 `DOLLAR' 5 `F' 6 `IB' 7 `PIBHEX' 8 `P' 9 `PIB' 10 `PK' 11 `RB' 12 `RBHEX' 13 Not used. 14 Not used. 15 `Z' 16 `N' 17 `E' 18 Not used. 19 Not used. 20 `DATE' 21 `TIME' 22 `DATETIME' 23 `ADATE' 24 `JDATE' 25 `DTIME' 26 `WKDAY' 27 `MONTH' 28 `MOYR' 29 `QYR' 30 `WKYR' 31 `PCT' 32 `DOT' 33 `CCA' 34 `CCB' 35 `CCC' 36 `CCD' 37 `CCE' 38 `EDATE' 39 `SDATE' B.3 Value Labels Records ======================== The value label record has the following format: int32 rec_type; int32 label_count; /* Repeated `label_cnt' times. */ char value[8]; char label_len; char label[]; `int32 rec_type;' Record type. Always set to 3. `int32 label_count;' Number of value labels present in this record. The remaining fields are repeated `count' times. Each repetition specifies one value label. `char value[8];' A numeric value or a short string value padded as necessary to 8 bytes in length. Its type and width cannot be determined until the following value label variables record (see below) is read. `char label_len;' The label's length, in bytes. `char label[];' `label_len' bytes of the actual label, followed by up to 7 bytes of padding to bring `label' and `label_len' together to a multiple of 8 bytes in length. The value label record is always immediately followed by a value label variables record with the following format: int32 rec_type; int32 var_count; int32 vars[]; `int32 rec_type;' Record type. Always set to 4. `int32 var_count;' Number of variables that the associated value labels from the value label record are to be applied. `int32 vars[];' A list of dictionary indexes of variables to which to apply the value labels (*note Dictionary Index::). There are `var_count' elements. String variables wider than 8 bytes may not have value labels. B.4 Document Record =================== The document record, if present, has the following format: int32 rec_type; int32 n_lines; char lines[][80]; `int32 rec_type;' Record type. Always set to 6. `int32 n_lines;' Number of lines of documents present. `char lines[][80];' Document lines. The number of elements is defined by `n_lines'. Lines shorter than 80 characters are padded on the right with spaces. B.5 Machine Integer Info Record =============================== The integer info record, if present, has the following format: /* Header. */ int32 rec_type; int32 subtype; int32 size; int32 count; /* Data. */ int32 version_major; int32 version_minor; int32 version_revision; int32 machine_code; int32 floating_point_rep; int32 compression_code; int32 endianness; int32 character_code; `int32 rec_type;' Record type. Always set to 7. `int32 subtype;' Record subtype. Always set to 3. `int32 size;' Size of each piece of data in the data part, in bytes. Always set to 4. `int32 count;' Number of pieces of data in the data part. Always set to 8. `int32 version_major;' PSPP major version number. In version X.Y.Z, this is X. `int32 version_minor;' PSPP minor version number. In version X.Y.Z, this is Y. `int32 version_revision;' PSPP version revision number. In version X.Y.Z, this is Z. `int32 machine_code;' Machine code. PSPP always set this field to value to -1, but other values may appear. `int32 floating_point_rep;' Floating point representation code. For IEEE 754 systems this is 1. IBM 370 sets this to 2, and DEC VAX E to 3. `int32 compression_code;' Compression code. Always set to 1. `int32 endianness;' Machine endianness. 1 indicates big-endian, 2 indicates little-endian. `int32 character_code;' Character code. 1 indicates EBCDIC, 2 indicates 7-bit ASCII, 3 indicates 8-bit ASCII, 4 indicates DEC Kanji. Windows code page numbers are also valid. B.6 Machine Floating-Point Info Record ====================================== The floating-point info record, if present, has the following format: /* Header. */ int32 rec_type; int32 subtype; int32 size; int32 count; /* Data. */ flt64 sysmis; flt64 highest; flt64 lowest; `int32 rec_type;' Record type. Always set to 7. `int32 subtype;' Record subtype. Always set to 4. `int32 size;' Size of each piece of data in the data part, in bytes. Always set to 8. `int32 count;' Number of pieces of data in the data part. Always set to 3. `flt64 sysmis;' The system missing value. `flt64 highest;' The value used for HIGHEST in missing values. `flt64 lowest;' The value used for LOWEST in missing values. B.7 Variable Display Parameter Record ===================================== The variable display parameter record, if present, has the following format: /* Header. */ int32 rec_type; int32 subtype; int32 size; int32 count; /* Repeated `count' times. */ int32 measure; int32 width; /* Not always present. */ int32 alignment; `int32 rec_type;' Record type. Always set to 7. `int32 subtype;' Record subtype. Always set to 11. `int32 size;' The size of `int32'. Always set to 4. `int32 count;' The number of sets of variable display parameters (ordinarily the number of variables in the dictionary), times 2 or 3. The remaining members are repeated `count' times, in the same order as the variable records. No element corresponds to variable records that continue long string variables. The meanings of these members are as follows: `int32 measure;' The measurement type of the variable: 1 Nominal Scale 2 Ordinal Scale 3 Continuous Scale SPSS 14 sometimes writes a `measure' of 0. PSPP interprets this as nominal scale. `int32 width;' The width of the display column for the variable in characters. This field is present if COUNT is 3 times the number of variables in the dictionary. It is omitted if COUNT is 2 times the number of variables. `int32 alignment;' The alignment of the variable for display purposes: 0 Left aligned 1 Right aligned 2 Centre aligned B.8 Long Variable Names Record ============================== If present, the long variable names record has the following format: /* Header. */ int32 rec_type; int32 subtype; int32 size; int32 count; /* Exactly `count' bytes of data. */ char var_name_pairs[]; `int32 rec_type;' Record type. Always set to 7. `int32 subtype;' Record subtype. Always set to 13. `int32 size;' The size of each element in the `var_name_pairs' member. Always set to 1. `int32 count;' The total number of bytes in `var_name_pairs'. `char var_name_pairs[];' A list of KEY-VALUE tuples, where KEY is the name of a variable, and VALUE is its long variable name. The KEY field is at most 8 bytes long and must match the name of a variable which appears in the variable record (*note Variable Record::). The VALUE field is at most 64 bytes long. The KEY and VALUE fields are separated by a `=' byte. Each tuple is separated by a byte whose value is 09. There is no trailing separator following the last tuple. The total length is `count' bytes. B.9 Very Long String Record =========================== Old versions of SPSS limited string variables to a width of 255 bytes. For backward compatibility with these older versions, the system file format represents a string longer than 255 bytes, called a "very long string", as a collection of strings no longer than 255 bytes each. The strings concatenated to make a very long string are called its "segments"; for consistency, variables other than very long strings are considered to have a single segment. A very long string with a width of W has N = (W + 251) / 252 segments, that is, one segment for every 252 bytes of width, rounding up. It would be logical, then, for each of the segments except the last to have a width of 252 and the last segment to have the remainder, but this is not the case. In fact, each segment except the last has a width of 255 bytes. The last segment has width W - (N - 1) * 252; some versions of SPSS make it slightly wider, but not wide enough to make the last segment require another 8 bytes of data. Data is packed tightly into segments of a very long string, 255 bytes per segment. Because 255 bytes of segment data are allocated for every 252 bytes of the very long string's width (approximately), some unused space is left over at the end of the allocated segments. Data in unused space is ignored. Example: Consider a very long string of width 20,000. Such a very long string has 20,000 / 252 = 80 (rounding up) segments. The first 79 segments have width 255; the last segment has width 20,000 - 79 * 252 = 92 or slightly wider (up to 96 bytes, the next multiple of 8). The very long string's data is actually stored in the 19,890 bytes in the first 78 segments, plus the first 110 bytes of the 79th segment (19,890 + 110 = 20,000). The remaining 145 bytes of the 79th segment and all 92 bytes of the 80th segment are unused. The very long string record explains how to stitch together segments to obtain very long string data. For each of the very long string variables in the dictionary, it specifies the name of its first segment's variable and the very long string variable's actual width. The remaining segments immediately follow the named variable in the system file's dictionary. The very long string record, which is present only if the system file contains very long string variables, has the following format: /* Header. */ int32 rec_type; int32 subtype; int32 size; int32 count; /* Exactly `count' bytes of data. */ char string_lengths[]; `int32 rec_type;' Record type. Always set to 7. `int32 subtype;' Record subtype. Always set to 14. `int32 size;' The size of each element in the `string_lengths' member. Always set to 1. `int32 count;' The total number of bytes in `string_lengths'. `char string_lengths[];' A list of KEY-VALUE tuples, where KEY is the name of a variable, and VALUE is its length. The KEY field is at most 8 bytes long and must match the name of a variable which appears in the variable record (*note Variable Record::). The VALUE field is exactly 5 bytes long. It is a zero-padded, ASCII-encoded string that is the length of the variable. The KEY and VALUE fields are separated by a `=' byte. Tuples are delimited by a two-byte sequence {00, 09}. After the last tuple, there may be a single byte 00, or {00, 09}. The total length is `count' bytes. B.10 Miscellaneous Informational Records ======================================== Some specific types of miscellaneous informational records are documented here, but others are known to exist. PSPP ignores unknown miscellaneous informational records when reading system files. /* Header. */ int32 rec_type; int32 subtype; int32 size; int32 count; /* Exactly `size * count' bytes of data. */ char data[]; `int32 rec_type;' Record type. Always set to 7. `int32 subtype;' Record subtype. May take any value. According to Aapi Ha"ma"la"inen, value 5 indicates a set of grouped variables and 6 indicates date info (probably related to USE). `int32 size;' Size of each piece of data in the data part. Should have the value 1, 4, or 8, for `char', `int32', and `flt64' format data, respectively. `int32 count;' Number of pieces of data in the data part. `char data[];' Arbitrary data. There must be `size' times `count' bytes of data. B.11 Dictionary Termination Record ================================== The dictionary termination record separates all other records from the data records. int32 rec_type; int32 filler; `int32 rec_type;' Record type. Always set to 999. `int32 filler;' Ignored padding. Should be set to 0. B.12 Data Record ================ Data records must follow all other records in the system file. There must be at least one data record in every system file. The format of data records varies depending on whether the data is compressed. Regardless, the data is arranged in a series of 8-byte elements. When data is not compressed, each element corresponds to the variable declared in the respective variable record (*note Variable Record::). Numeric values are given in `flt64' format; string values are literal characters string, padded on the right when necessary to fill out 8-byte units. Compressed data is arranged in the following manner: the first 8 bytes in the data section is divided into a series of 1-byte command codes. These codes have meanings as described below: 0 Ignored. If the program writing the system file accumulates compressed data in blocks of fixed length, 0 bytes can be used to pad out extra bytes remaining at the end of a fixed-size block. 1 through 251 A number with value CODE - BIAS, where CODE is the value of the compression code and BIAS is the variable `bias' from the file header. For example, code 105 with bias 100.0 (the normal value) indicates a numeric variable of value 5. 252 End of file. This code may or may not appear at the end of the data stream. PSPP always outputs this code but its use is not required. 253 A numeric or string value that is not compressible. The value is stored in the 8 bytes following the current block of command bytes. If this value appears twice in a block of command bytes, then it indicates the second group of 8 bytes following the command bytes, and so on. 254 An 8-byte string value that is all spaces. 255 The system-missing value. When the end of the an 8-byte group of command bytes is reached, any blocks of non-compressible values indicated by code 253 are skipped, and the next element of command bytes is read and interpreted, until the end of the file or a code with value 252 is reached. Appendix C `q2c' Input Format ***************************** PSPP statistical procedures have a bizarre and somewhat irregular syntax. Despite this, a parser generator has been written that adequately addresses many of the possibilities and tries to provide hooks for the exceptional cases. This parser generator is named `q2c'. C.1 Invoking q2c ================ q2c INPUT.Q OUTPUT.C `q2c' translates a `.q' file into a `.c' file. It takes exactly two command-line arguments, which are the input file name and output file name, respectively. `q2c' does not accept any command-line options. C.2 `q2c' Input Structure ========================= `q2c' input files are divided into two sections: the grammar rules and the supporting code. The "grammar rules", which make up the first part of the input, are used to define the syntax of the statistical procedure to be parsed. The "supporting code", following the grammar rules, are copied largely unchanged to the output file, except for certain escapes. The most important lines in the grammar rules are used for defining procedure syntax. These lines can be prefixed with a dollar sign (`$'), which prevents Emacs' CC-mode from munging them. Besides this, a bang (`!') at the beginning of a line causes the line, minus the bang, to be written verbatim to the output file (useful for comments). As a third special case, any line that begins with the exact characters `/* *INDENT' is ignored and not written to the output. This allows `.q' files to be processed through `indent' without being munged. The syntax of the grammar rules themselves is given in the following sections. The supporting code is passed into the output file largely unchanged. However, the following escapes are supported. Each escape must appear on a line by itself. `/* (header) */' Expands to a series of C `#include' directives which include the headers that are required for the parser generated by `q2c'. `/* (decls SCOPE) */' Expands to C variable and data type declarations for the variables and `enum's input and output by the `q2c' parser. SCOPE must be either `local' or `global'. `local' causes the declarations to be output as function locals. `global' causes them to be declared as `static' module variables; thus, `global' is a bit of a misnomer. `/* (parser) */' Expands to the entire parser. Must be enclosed within a C function. `/* (free) */' Expands to a set of calls to the `free' function for variables declared by the parser. Only needs to be invoked if subcommands of type `string' are used in the grammar rules. C.3 Grammar Rules ================= The grammar rules describe the format of the syntax that the parser generated by `q2c' will understand. The way that the grammar rules are included in `q2c' input file are described above. The grammar rules are divided into tokens of the following types: Identifier (`ID') An identifier token is a sequence of letters, digits, and underscores (`_'). Identifiers are _not_ case-sensitive. String (`STRING') String tokens are initiated by a double-quote character (`"') and consist of all the characters between that double quote and the next double quote, which must be on the same line as the first. Within a string, a backslash can be used as a "literal escape". The only reasons to use a literal escape are to include a double quote or a backslash within a string. Special character Other characters, other than white space, constitute tokens in themselves. The syntax of the grammar rules is as follows: grammar-rules ::= command-name opt-prefix : subcommands . command-name ::= ID ::= STRING opt-prefix ::= ::= ( ID ) subcommands ::= subcommand ::= subcommands ; subcommand The syntax begins with an ID token that gives the name of the procedure to be parsed. For command names that contain multiple words, a STRING token may be used instead, e.g. `"FILE HANDLE"'. Optionally, an ID in parentheses specifies a prefix used for all file-scope identifiers declared by the emitted code. The rest of the syntax consists of subcommands separated by semicolons (`;') and terminated with a full stop (`.'). subcommand ::= default-opt arity-opt ID sbc-defn default-opt ::= ::= * arity-opt ::= ::= + ::= ^ sbc-defn ::= opt-prefix = specifiers ::= [ ID ] = array-sbc ::= opt-prefix = sbc-special-form A subcommand that begins with an asterisk (`*') is the default subcommand. The keyword used for the default subcommand can be omitted in the PSPP syntax file. A plus sign (`+') indicates that a subcommand can appear more than once. A caret (`^') indicate that a subcommand must appear exactly once. A subcommand marked with neither character may appear once or not at all, but not more than once. The subcommand name appears after the leading option characters. There are three forms of subcommands. The first and most common form simply gives an equals sign (`=') and a list of specifiers, which can each be set to a single setting. The second form declares an array, which is a set of flags that can be individually turned on by the user. There are also several special forms that do not take a list of specifiers. Arrays require an additional `ID' argument. This is used as a prefix, prepended to the variable names constructed from the specifiers. The other forms also allow an optional prefix to be specified. array-sbc ::= alternatives ::= array-sbc , alternatives alternatives ::= ID ::= alternatives | ID An array subcommand is a set of Boolean values that can independently be turned on by the user, listed separated by commas (`,'). If an value has more than one name then these names are separated by pipes (`|'). specifiers ::= specifier ::= specifiers , specifier specifier ::= opt-id : settings opt-id ::= ::= ID Ordinary subcommands (other than arrays and special forms) require a list of specifiers. Each specifier has an optional name and a list of settings. If the name is given then a correspondingly named variable will be used to store the user's choice of setting. If no name is given then there is no way to tell which setting the user picked; in this case the settings should probably have values attached. settings ::= setting ::= settings / setting setting ::= setting-options ID setting-value setting-options ::= ::= * ::= ! ::= * ! Individual settings are separated by forward slashes (`/'). Each setting can be as little as an `ID' token, but options and values can optionally be included. The `*' option means that, for this setting, the `ID' can be omitted. The `!' option means that this option is the default for its specifier. setting-value ::= ::= ( setting-value-2 ) ::= setting-value-2 setting-value-2 ::= setting-value-options setting-value-type : ID setting-value-restriction setting-value-options ::= ::= * setting-value-type ::= N ::= D ::= S setting-value-restriction ::= ::= , STRING Settings may have values. If the value must be enclosed in parentheses, then enclose the value declaration in parentheses. Declare the setting type as `n', `d', or `s' for integer, floating-point, or string type, respectively. The given `ID' is used to construct a variable name. If option `*' is given, then the value is optional; otherwise it must be specified whenever the corresponding setting is specified. A "restriction" can also be specified which is a string giving a C expression limiting the valid range of the value. The special escape `%s' should be used within the restriction to refer to the setting's value variable. sbc-special-form ::= VAR ::= VARLIST varlist-options ::= INTEGER opt-list ::= DOUBLE opt-list ::= PINT ::= STRING (the literal word STRING) string-options ::= CUSTOM varlist-options ::= ::= ( STRING ) opt-list ::= ::= LIST string-options ::= ::= ( STRING STRING ) The special forms are of the following types: `VAR' A single variable name. `VARLIST' A list of variables. If given, the string can be used to provide `PV_*' options to the call to `parse_variables'. `INTEGER' A single integer value. `INTEGER LIST' A list of integers separated by spaces or commas. `DOUBLE' A single floating-point value. `DOUBLE LIST' A list of floating-point values. `PINT' A single positive integer value. `STRING' A string value. If the options are given then the first string is an expression giving a restriction on the value of the string; the second string is an error message to display when the restriction is violated. `CUSTOM' A custom function is used to parse this subcommand. The function must have prototype `int custom_NAME (void)'. It should return 0 on failure (when it has already issued an appropriate diagnostic), 1 on success, or 2 if it fails and the calling function should issue a syntax error on behalf of the custom handler. Appendix D GNU Free Documentation License ***************************************** Version 1.2, November 2002 Copyright (C) 2000,2001,2002 Free Software Foundation, Inc. 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. 0. PREAMBLE The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others. This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software. We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference. 1. APPLICABILITY AND DEFINITIONS This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law. A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language. A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them. The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none. The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words. A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not "Transparent" is called "Opaque". Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only. The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text. A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications", "Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means that it remains a section "Entitled XYZ" according to this definition. The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License. 2. VERBATIM COPYING You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3. You may also lend copies, under the same conditions stated above, and you may publicly display copies. 3. COPYING IN QUANTITY If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects. If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages. If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public. It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document. 4. MODIFICATIONS You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version: A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission. B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this requirement. C. State on the Title page the name of the publisher of the Modified Version, as the publisher. D. Preserve all the copyright notices of the Document. E. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices. F. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below. G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice. H. Include an unaltered copy of this License. I. Preserve the section Entitled "History", Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section Entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence. J. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission. K. For any section Entitled "Acknowledgements" or "Dedications", Preserve the Title of the section, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein. L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles. M. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified Version. N. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any Invariant Section. O. Preserve any Warranty Disclaimers. If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles. You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties--for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard. You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one. The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version. 5. COMBINING DOCUMENTS You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers. The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work. In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"; likewise combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements." 6. COLLECTIONS OF DOCUMENTS You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects. You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document. 7. AGGREGATION WITH INDEPENDENT WORKS A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document. If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate. 8. TRANSLATION Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail. If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title. 9. TERMINATION You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 10. FUTURE REVISIONS OF THIS LICENSE The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See `http://www.gnu.org/copyleft/'. Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation. D.1 ADDENDUM: How to use this License for your documents ======================================================== To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page: Copyright (C) YEAR YOUR NAME. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled ``GNU Free Documentation License''. If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the "with...Texts." line with this: with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation. If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.