PSPP

Table of Contents


Next: , Up: (dir)

GNU PSPP

This manual is for GNU PSPP version 0.4.3, software for statistical analysis.

Copyright © 1997, 1998, 2004, 2005 Free Software Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover Texts being “A GNU Manual,” and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled “GNU Free Documentation License.”

(a) The FSF's Back-Cover Text is: “You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development.”


Next: , Previous: Top, Up: Top

1 Introduction

PSPP is a tool for statistical analysis of sampled data. It reads a syntax file and a data file, analyzes the data, and writes the results to a listing file or to standard output.

The language accepted by PSPP is similar to those accepted by SPSS statistical products. The details of PSPP's language are given later in this manual.

PSPP produces output in two forms: tables and charts. Both of these can be written in several formats; currently, ASCII, PostScript, and HTML are supported. In the future, more drivers, such as PCL and X Window System drivers, may be developed. For now, Ghostscript, available from the Free Software Foundation, may be used to convert PostScript chart output to other formats.

The current version of PSPP, 0.4.3, is woefully incomplete in terms of its statistical procedure support. PSPP is a work in progress. The author hopes to support fully support all features in the products that PSPP replaces, eventually. The author welcomes questions, comments, donations, and code submissions. See Submitting Bug Reports, for instructions on contacting the author.


Next: , Previous: Introduction, Up: Top

2 Your rights and obligations

PSPP is not in the public domain. It is copyrighted and there are restrictions on its distribution, but these restrictions are designed to permit everything that a good cooperating citizen would want to do. What is not allowed is to try to prevent others from further sharing any version of this program that they might get from you.

Specifically, we want to make sure that you have the right to give away copies of PSPP, that you receive source code or else can get it if you want it, that you can change these programs or use pieces of them in new free programs, and that you know you can do these things.

To make sure that everyone has such rights, we have to forbid you to deprive anyone else of these rights. For example, if you distribute copies of PSPP, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights.

Also, for our own protection, we must make certain that everyone finds out that there is no warranty for PSPP. If these programs are modified by someone else and passed on, we want their recipients to know that what they have is not what we distributed, so that any problems introduced by others will not reflect on our reputation.

Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all.

The precise conditions of the license for PSPP are found in the GNU General Public License. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. This manual specifically is covered by the GNU Free Documentation License (see GNU Free Documentation License).


Next: , Previous: License, Up: Top

3 Invoking PSPP

     pspp [ -B dir | --config-dir=dir ] [ -o device | --device=device ]
            [ -d var[=value] | --define=var[=value] ] [-u var | --undef=var ]
            [ -f file | --out-file=file ] [ -p | --pipe ] [ -I- | --no-include ]
            [ -I dir | --include=dir ] [ -i | --interactive ]
            [ -n | --edit | --dry-run | --just-print | --recon ]
            [ -r | --no-statrc ] [ -h | --help ] [ -l | --list ]
            [ -c command | --command command ] [ -s | --safer ]
            [ --testing-mode ] [ -V | --version ] [ -v | --verbose ]
            [ key=value ] file....


Next: , Up: Invocation

3.1 Non-option Arguments

Syntax files and output device substitutions can be specified on PSPP's command line:

file
A file by itself on the command line will be executed as a syntax file. If multiple files may be specified, they are executed in order, as if their contents had been given in a single file. PSPP terminates after the syntax files run, unless the -i or --interactive option is given (see Language control options).
key=value
Defines an output device macro key to expand to value, overriding any macro having the same key defined in the device configuration file. See Macro definitions.

There is one other way to specify a syntax file, if your operating system supports it. If you have a syntax file foobar.stat, put the notation

     #! /usr/local/bin/pspp

at the top, and mark the file as executable with chmod +x foobar.stat. (If PSPP is not installed in /usr/local/bin, then insert its actual installation directory into the syntax file instead.) Now you should be able to invoke the syntax file just by typing its name. You can include any options on the command line as usual. PSPP entirely ignores any lines beginning with `#!'.


Next: , Previous: Non-option Arguments, Up: Invocation

3.2 Configuration Options

Configuration options are used to change PSPP's configuration for the current run. The configuration options are:

-a {compatible|enhanced}
--algorithm={compatible|enhanced}
If you chose compatible, then PSPP will use the same algorithms as used by some proprietary statistical analysis packages. This is not recommended, as these algorithms are inferior and in some cases compeletely broken. The default setting is enhanced. Certain commands have subcommands which allow you to override this setting on a per command basis.
-B dir
--config-dir=dir
Sets the configuration directory to dir. See File locations.
-o device
--device=device
Selects the output device with name device. If this option is given more than once, then all devices mentioned are selected. This option disables all devices besides those mentioned on the command line.


Next: , Previous: Configuration Options, Up: Invocation

3.3 Input and output options

Input and output options affect how PSPP reads input and writes output. These are the input and output options:

-f file
--out-file=file
This overrides the output file name for devices designated as listing devices. If a file named file already exists, it is overwritten.
-p
--pipe
Allows PSPP to be used as a filter by causing the syntax file to be read from stdin and output to be written to stdout. Conflicts with the -f file and --file=file options.
-I-
--no-include
Clears all directories from the include path. This includes all directories put in the include path by default. See Miscellaneous configuring.
-I dir
--include=dir
Appends directory dir to the path that is searched for include files in PSPP syntax files.
-c command
--command=command
Execute literal command command. The command is executed before startup syntax files, if any.
--testing-mode
Invoke heuristics to assist with testing PSPP. For use by make check and similar scripts.


Next: , Previous: Input and output options, Up: Invocation

3.4 Language control options

Language control options control how PSPP syntax files are parsed and interpreted. The available language control options are:

-i
--interactive
When a syntax file is specified on the command line, PSPP normally terminates after processing it. Giving this option will cause PSPP to bring up a command prompt after processing the syntax file.

In addition, this forces syntax files to be interpreted in interactive mode, rather than the default batch mode. See Tokenizing lines, for information on the differences between batch mode and interactive mode command interpretation.

-n
--edit
--dry-run
--just-print
--recon
Only the syntax of any syntax file specified or of commands entered at the command line is checked. Transformations are not performed and procedures are not executed. Not yet implemented.
-r
--no-statrc
Prevents the execution of the PSPP startup syntax file.
-s
--safer
Disables certain unsafe operations. This includes the ERASE and HOST commands, as well as use of pipes as input and output files.


Previous: Language control options, Up: Invocation

3.5 Informational options

Informational options cause information about PSPP to be written to the terminal. Here are the available options:

-h
--help
Prints a message describing PSPP command-line syntax and the available device driver classes, then terminates.
-l
--list
Lists the available device driver classes, then terminates.
-x {compatible|enhanced}
--syntax={compatible|enhanced}
If you chose compatible, then PSPP will only accept command syntax that is compatible with the proprietary program SPSS. If you choose enhanced then additional syntax will be available. The default is enhanced.
-V
--version
Prints a brief message listing PSPP's version, warranties you don't have, copying conditions and copyright, and e-mail address for bug reports, then terminates.
-v
--verbose
Increments PSPP's verbosity level. Higher verbosity levels cause PSPP to display greater amounts of information about what it is doing. Often useful for debugging PSPP's configuration.

This option can be given multiple times to set the verbosity level to that value. The default verbosity level is 0, in which no informational messages will be displayed.

Higher verbosity levels cause messages to be displayed when the corresponding events take place.

1
Driver and subsystem initializations.
2
Completion of driver initializations. Beginning of driver closings.
3
Completion of driver closings.
4
Files searched for; success of searches.
5
Individual directories included in file searches.

Each verbosity level also includes messages from lower verbosity levels.


Next: , Previous: Invocation, Up: Top

4 The PSPP language

Please note: PSPP is not even close to completion. Only a few statistical procedures are implemented. PSPP is a work in progress.

This chapter discusses elements common to many PSPP commands. Later chapters will describe individual commands in detail.


Next: , Up: Language

4.1 Tokens

PSPP divides most syntax file lines into series of short chunks called tokens. Tokens are then grouped to form commands, each of which tells PSPP to take some action—read in data, write out data, perform a statistical procedure, etc. Each type of token is described below.

Identifiers
Identifiers are names that typically specify variables, commands, or subcommands. The first character in an identifier must be a letter, `#', or `@'. The remaining characters in the identifier must be letters, digits, or one of the following special characters:
          
. _ $ # @

Identifiers may be any length, but only the first 64 bytes are significant. Identifiers are not case-sensitive: foobar, Foobar, FooBar, FOOBAR, and FoObaR are different representations of the same identifier.

Some identifiers are reserved. Reserved identifiers may not be used in any context besides those explicitly described in this manual. The reserved identifiers are:

          
ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH

Keywords
Keywords are a subclass of identifiers that form a fixed part of command syntax. For example, command and subcommand names are keywords. Keywords may be abbreviated to their first 3 characters if this abbreviation is unambiguous. (Unique abbreviations of 3 or more characters are also accepted: `FRE', `FREQ', and `FREQUENCIES' are equivalent when the last is a keyword.)

Reserved identifiers are always used as keywords. Other identifiers may be used both as keywords and as user-defined identifiers, such as variable names.

Numbers
Numbers are expressed in decimal. A decimal point is optional. Numbers may be expressed in scientific notation by adding `e' and a base-10 exponent, so that `1.234e3' has the value 1234. Here are some more examples of valid numbers:
          -5  3.14159265359  1e100  -.707  8945.
     

Negative numbers are expressed with a `-' prefix. However, in situations where a literal `-' token is expected, what appears to be a negative number is treated as `-' followed by a positive number.

No white space is allowed within a number token, except for horizontal white space between `-' and the rest of the number.

The last example above, `8945.' will be interpreted as two tokens, `8945' and `.', if it is the last token on a line. See Forming commands of tokens.

Strings
Strings are literal sequences of characters enclosed in pairs of single quotes (`'') or double quotes (`"'). To include the character used for quoting in the string, double it, e.g. `'it''s an apostrophe''. White space and case of letters are significant inside strings.

Strings can be concatenated using `+', so that `"a" + 'b' + 'c'' is equivalent to `'abc''. Concatenation is useful for splitting a single string across multiple source lines. The maximum length of a string, after concatenation, is 255 characters.

Strings may also be expressed as hexadecimal, octal, or binary character values by prefixing the initial quote character by `X', `O', or `B' or their lowercase equivalents. Each pair, triplet, or octet of characters, according to the radix, is transformed into a single character with the given value. If there is an incomplete group of characters, the missing final digits are assumed to be `0'. These forms of strings are nonportable because numeric values are associated with different characters by different operating systems. Therefore, their use should be confined to syntax files that will not be widely distributed.

The character with value 00 is reserved for internal use by PSPP. Its use in strings causes an error and replacement by a space character.

Punctuators and Operators
These tokens are the punctuators and operators:
          
, / = ( ) + - * / ** < <= <> > >= ~= & | .

Most of these appear within the syntax of commands, but the period (`.') punctuator is used only at the end of a command. It is a punctuator only as the last character on a line (except white space). When it is the last non-space character on a line, a period is not treated as part of another token, even if it would otherwise be part of, e.g., an identifier or a floating-point number.

Actually, the character that ends a command can be changed with SET's ENDCMD subcommand (see SET), but we do not recommend doing so. Throughout the remainder of this manual we will assume that the default setting is in effect.


Next: , Previous: Tokens, Up: Language

4.2 Forming commands of tokens

Most PSPP commands share a common structure. A command begins with a command name, such as FREQUENCIES, DATA LIST, or N OF CASES. The command name may be abbreviated to its first word, and each word in the command name may be abbreviated to its first three or more characters, where these abbreviations are unambiguous.

The command name may be followed by one or more subcommands. Each subcommand begins with a subcommand name, which may be abbreviated to its first three letters. Some subcommands accept a series of one or more specifications, which follow the subcommand name, optionally separated from it by an equals sign (`='). Specifications may be separated from each other by commas or spaces. Each subcommand must be separated from the next (if any) by a forward slash (`/').

There are multiple ways to mark the end of a command. The most common way is to end the last line of the command with a period (`.') as described in the previous section (see Tokens). A blank line, or one that consists only of white space or comments, also ends a command by default, although you can use the NULLINE subcommand of SET to disable this feature (see SET).

In batch mode only, that is, when reading commands from a file instead of an interactive user, any line that contains a non-space character in the leftmost column begins a new command. Thus, each command consists of a flush-left line followed by any number of lines indented from the left margin. In this mode, a plus or minus sign (`+', `') as the first character in a line is ignored and causes that line to begin a new command, which allows for visual indentation of a command without that command being considered part of the previous command.


Next: , Previous: Commands, Up: Language

4.3 Types of Commands

Commands in PSPP are divided roughly into six categories:

Utility commands
Set or display various global options that affect PSPP operations. May appear anywhere in a syntax file. See Utility commands.
File definition commands
Give instructions for reading data from text files or from special binary “system files”. Most of these commands replace any previous data or variables with new data or variables. At least one file definition command must appear before the first command in any of the categories below. See Data Input and Output.
Input program commands
Though rarely used, these provide tools for reading data files in arbitrary textual or binary formats. See INPUT PROGRAM.
Transformations
Perform operations on data and write data to output files. Transformations are not carried out until a procedure is executed.
Restricted transformations
Transformations that cannot appear in certain contexts. See Order of Commands, for details.
Procedures
Analyze data, writing results of analyses to the listing file. Cause transformations specified earlier in the file to be performed. In a more general sense, a procedure is any command that causes the active file (the data) to be read.


Next: , Previous: Types of Commands, Up: Language

4.4 Order of Commands

PSPP does not place many restrictions on ordering of commands. The main restriction is that variables must be defined before they are otherwise referenced. This section describes the details of command ordering, but most users will have no need to refer to them.

PSPP possesses five internal states, called initial, INPUT PROGRAM, FILE TYPE, transformation, and procedure states. (Please note the distinction between the INPUT PROGRAM and FILE TYPE commands and the INPUT PROGRAM and FILE TYPE states.)

PSPP starts in the initial state. Each successful completion of a command may cause a state transition. Each type of command has its own rules for state transitions:

Utility commands

DATA LIST

INPUT PROGRAM

FILE TYPE

Other file definition commands

Transformations

Restricted transformations

Procedures


Next: , Previous: Order of Commands, Up: Language

4.5 Handling missing observations

PSPP includes special support for unknown numeric data values. Missing observations are assigned a special value, called the system-missing value. This “value” actually indicates the absence of a value; it means that the actual value is unknown. Procedures automatically exclude from analyses those observations or cases that have missing values. Details of missing value exclusion depend on the procedure and can often be controlled by the user; refer to descriptions of individual procedures for details.

The system-missing value exists only for numeric variables. String variables always have a defined value, even if it is only a string of spaces.

Variables, whether numeric or string, can have designated user-missing values. Every user-missing value is an actual value for that variable. However, most of the time user-missing values are treated in the same way as the system-missing value. String variables that are wider than a certain width, usually 8 characters (depending on computer architecture), cannot have user-missing values.

For more information on missing values, see the following sections: Variables, MISSING VALUES, Expressions. See also the documentation on individual procedures for information on how they handle missing values.


Next: , Previous: Missing Observations, Up: Language

4.6 Variables

Variables are the basic unit of data storage in PSPP. All the variables in a file taken together, apart from any associated data, are said to form a dictionary. Some details of variables are described in the sections below.


Next: , Up: Variables

4.6.1 Attributes of Variables

Each variable has a number of attributes, including:

Name
An identifier, up to 64 bytes long. Each variable must have a different name. See Tokens.

Some system variable names begin with `$', but user-defined variables' names may not begin with `$'.

The final character in a variable name should not be `.', because such an identifier will be misinterpreted when it is the final token on a line: FOO. will be divided into two separate tokens, `FOO' and `.', indicating end-of-command. See Tokens.

The final character in a variable name should not be `_', because some such identifiers are used for special purposes by PSPP procedures.

As with all PSPP identifiers, variable names are not case-sensitive. PSPP capitalizes variable names on output the same way they were capitalized at their point of definition in the input.


Type
Numeric or string.


Width
(string variables only) String variables with a width of 8 characters or fewer are called short string variables. Short string variables can be used in many procedures where long string variables (those with widths greater than 8) are not allowed.

Certain systems may consider strings longer than 8 characters to be short strings. Eight characters represents a minimum figure for the maximum length of a short string.

Position
Variables in the dictionary are arranged in a specific order. DISPLAY can be used to show this order: see DISPLAY.
Initialization
Either reinitialized to 0 or spaces for each case, or left at its existing value. See LEAVE.


Missing values
Optionally, up to three values, or a range of values, or a specific value plus a range, can be specified as user-missing values. There is also a system-missing value that is assigned to an observation when there is no other obvious value for that observation. Observations with missing values are automatically excluded from analyses. User-missing values are actual data values, while the system-missing value is not a value at all. See Missing Observations.


Variable label
A string that describes the variable. See VARIABLE LABELS.


Value label
Optionally, these associate each possible value of the variable with a string. See VALUE LABELS.


Print format
Display width, format, and (for numeric variables) number of decimal places. This attribute does not affect how data are stored, just how they are displayed. Example: a width of 8, with 2 decimal places. See Input and Output Formats.


Write format
Similar to print format, but used by the WRITE command (see WRITE).


Next: , Previous: Attributes, Up: Variables

4.6.2 Variables Automatically Defined by PSPP

There are seven system variables. These are not like ordinary variables because system variables are not always stored. They can be used only in expressions. These system variables, whose values and output formats cannot be modified, are described below.

$CASENUM
Case number of the case at the moment. This changes as cases are shuffled around.


$DATE
Date the PSPP process was started, in format A9, following the pattern DD MMM YY.


$JDATE
Number of days between 15 Oct 1582 and the time the PSPP process was started.


$LENGTH
Page length, in lines, in format F11.


$SYSMIS
System missing value, in format F1.


$TIME
Number of seconds between midnight 14 Oct 1582 and the time the active file was read, in format F20.


$WIDTH
Page width, in characters, in format F3.


Next: , Previous: System Variables, Up: Variables

4.6.3 Lists of variable names

To refer to a set of variables, list their names one after another. Optionally, their names may be separated by commas. To include a range of variables from the dictionary in the list, write the name of the first and last variable in the range, separated by TO. For instance, if the dictionary contains six variables with the names ID, X1, X2, GOAL, MET, and NEXTGOAL, in that order, then X2 TO MET would include variables X2, GOAL, and MET.

Commands that define variables, such as DATA LIST, give TO an alternate meaning. With these commands, TO define sequences of variables whose names end in consecutive integers. The syntax is two identifiers that begin with the same root and end with numbers, separated by TO. The syntax X1 TO X5 defines 5 variables, named X1, X2, X3, X4, and X5. The syntax ITEM0008 TO ITEM0013 defines 6 variables, named ITEM0008, ITEM0009, ITEM0010, ITEM0011, ITEM0012, and ITEM00013. The syntaxes QUES001 TO QUES9 and QUES6 TO QUES3 are invalid.

After a set of variables has been defined with DATA LIST or another command with this method, the same set can be referenced on later commands using the same syntax.


Next: , Previous: Sets of Variables, Up: Variables

4.6.4 Input and Output Formats

An input format describes how to interpret the contents of an input field as a number or a string. It might specify that the field contains an ordinary decimal number, a time or date, a number in binary or hexadecimal notation, or one of several other notations. Input formats are used by commands such as DATA LIST that read data or syntax files into the PSPP active file.

Every input format corresponds to a default output format that specifies the formatting used when the value is output later. It is always possible to explicitly specify an output format that resembles the input format. Usually, this is the default, but in cases where the input format is unfriendly to human readability, such as binary or hexadecimal formats, the default output format is an easier-to-read decimal format.

Every variable has two output formats, called its print format and write format. Print formats are used in most output contexts; write formats are used only by WRITE (see WRITE). Newly created variables have identical print and write formats, and FORMATS, the most commonly used command for changing formats (see FORMATS), sets both of them to the same value as well. Thus, most of the time, the distinction between print and write formats is unimportant.

Input and output formats are specified to PSPP with a format specification of the form TYPEw or TYPEw.d, where TYPE is one of the format types described later, w is a field width measured in columns, and d is an optional number of decimal places. If d is omitted, a value of 0 is assumed. Some formats do not allow a nonzero d to be specified.

The following sections describe the input and output formats supported by PSPP.


Next: , Up: Input and Output Formats
4.6.4.1 Basic Numeric Formats

The basic numeric formats are used for input and output of real numbers in standard or scientific notation. The following table shows an example of how each format displays positive and negative numbers with the default decimal point setting:

Format  3141.59 -3141.59
F8.2  3141.59 -3141.59
COMMA9.2  3,141.59 -3,141.59
DOT9.2  3.141,59 -3.141,59
DOLLAR10.2  $3,141.59 -$3,141.59
PCT9.2  3141.59% -3141.59%
E8.1  3.1E+003 -3.1E+003

On output, numbers in F format are expressed in standard decimal notation with the requested number of decimal places. The other formats output some variation on this style:

On input, the basic numeric formats accept positive and numbers in standard decimal notation or scientific notation. Leading and trailing spaces are allowed. An empty or all-spaces field, or one that contains only a single period, is treated as the system missing value.

In scientific notation, the exponent may be introduced by a sign (`+' or `-'), or by one of the letters `e' or `d' (in uppercase or lowercase), or by a letter followed by a sign. A single space may follow the letter or the sign or both.

On fixed-format DATA LIST (see DATA LIST FIXED) and in a few other contexts, decimals are implied when the field does not contain a decimal point. In F6.5 format, for example, the field 314159 is taken as the value 3.14159 with implied decimals. Decimals are never implied if an explicit decimal point is present or if scientific notation is used.

E and F formats accept the basic syntax already described. The other formats allow some additional variations:

All of the basic number formats have a maximum field width of 40 and accept no more than 16 decimal places, on both input and output. Some additional restrictions apply:

More details of basic numeric output formatting are given below:


Next: , Previous: Basic Numeric Formats, Up: Input and Output Formats
4.6.4.2 Custom Currency Formats

The custom currency formats are closely related to the basic numeric formats, but they allow users to customize the output format. The SET command configures custom currency formats, using the syntax

     SET CCx="string".

where x is A, B, C, D, or E, and string is no more than 16 characters long.

string must contain exactly three commas or exactly three periods (but not both), except that a single quote character may be used to “escape” a following comma, period, or single quote. If three commas are used, commas will be used for grouping in output, and a period will be used as the decimal point. Uses of periods reverses these roles.

The commas or periods divide string into four fields, called the negative prefix, prefix, suffix, and negative suffix, respectively. The prefix and suffix are added to output whenever space is available. The negative prefix and negative suffix are always added to a negative number when the output includes a nonzero digit.

The following syntax shows how custom currency formats could be used to reproduce basic numeric formats:

     SET CCA="-,,,".  /* Same as COMMA.
     SET CCB="-...".  /* Same as DOT.
     SET CCC="-,$,,". /* Same as DOLLAR.
     SET CCD="-,,%,". /* Like PCT, but groups with commas.

Here are some more examples of custom currency formats. The final example shows how to use a single quote to escape a delimiter:

     SET CCA=",EUR,,-".   /* Euro.
     SET CCB="(,USD ,,)". /* US dollar.
     SET CCC="-.R$..".    /* Brazilian real.
     SET CCD="-,, NIS,".  /* Israel shekel.
     SET CCE="-.Rp'. ..". /* Indonesia Rupiah.

These formats would yield the following output:

Format  3145.59 -3145.59
CCA12.2  EUR3,145.59 EUR3,145.59-
CCB14.2   USD 3,145.59 (USD 3,145.59)
CCC11.2  R$3.145,59 -R$3.145,59
CCD13.2  3,145.59 NIS -3,145.59 NIS
CCE10.0  Rp. 3.146 -Rp. 3.146

The default for all the custom currency formats is `-,,,', equivalent to COMMA format.


Next: , Previous: Custom Currency Formats, Up: Input and Output Formats
4.6.4.3 Legacy Numeric Formats

The N and Z numeric formats provide compatibility with legacy file formats. They have much in common:

N Format

The N format supports input and output of fields that contain only digits. On input, leading or trailing spaces, a decimal point, or any other non-digit character causes the field to be read as the system-missing value. As a special exception, an N format used on DATA LIST FREE or DATA LIST LIST is treated as the equivalent F format.

On output, N pads the field on the left with zeros. Negative numbers are output like the system-missing value.

Z Format

The Z format is a “zoned decimal” format used on IBM mainframes. Z format encodes the sign as part of the final digit, which must be one of the following:

     0123456789
     {ABCDEFGHI
     }JKLMNOPQR

where the characters in each row represent digits 0 through 9 in order. Characters in the first two rows indicate a positive sign; those in the third indicate a negative sign.

On output, Z fields are padded on the left with spaces. On input, leading and trailing spaces are ignored. Any character in an input field other than spaces, the digit characters above, and `.' causes the field to be read as system-missing.

The decimal point character for input and output is always `.', even if the decimal point character is a comma (see SET DECIMAL).

Nonzero, negative values output in Z format are marked as negative even when no nonzero digits are output. For example, -0.2 is output in Z1.0 format as `J'. The “negative zero” value supported by most machines is output as positive.


Next: , Previous: Legacy Numeric Formats, Up: Input and Output Formats
4.6.4.4 Binary and Hexadecimal Numeric Formats

The binary and hexadecimal formats are primarily designed for compatibility with existing machine formats, not for human readability. All of them therefore have a F format as default output format. Some of these formats are only portable between machines with compatible byte ordering (endianness) or floating-point format.

Binary formats use byte values that in text files are interpreted as special control functions, such as carriage return and line feed. Thus, data in binary formats should not be included in syntax files or read from data files with variable-length records, such as ordinary text files. They may be read from or written to data files with fixed-length records. See FILE HANDLE, for information on working with fixed-length records.

P and PK Formats

These are binary-coded decimal formats, in which every byte (except the last, in P format) represents two decimal digits. The most-significant 4 bits of the first byte is the most-significant decimal digit, the least-significant 4 bits of the first byte is the next decimal digit, and so on.

In P format, the most-significant 4 bits of the last byte are the least-significant decimal digit. The least-significant 4 bits represent the sign: decimal 15 indicates a negative value, decimal 13 indicates a positive value.

Numbers are rounded downward on output. The system-missing value and numbers outside representable range are output as zero.

The maximum field width is 16. Decimal places may range from 0 up to the number of decimal digits represented by the field.

The default output format is an F format with twice the input field width, plus one column for a decimal point (if decimal places were requested).

IB and PIB Formats

These are integer binary formats. IB reads and writes 2's complement binary integers, and PIB reads and writes unsigned binary integers. The byte ordering is by default the host machine's, but SET RIB may be used to select a specific byte ordering for reading (see SET RIB) and SET WIB, similarly, for writing (see SET WIB).

The maximum field width is 8. Decimal places may range from 0 up to the number of decimal digits in the largest value representable in the field width.

The default output format is an F format whose width is the number of decimal digits in the largest value representable in the field width, plus 1 if the format has decimal places.

RB Format

This is a binary format for real numbers. By default it reads and writes the host machine's floating-point format, but SET RRB may be used to select an alternate floating-point format for reading (see SET RRB) and SET WRB, similarly, for writing (see SET WRB).

The recommended field width depends on the floating-point format. NATIVE (the default format), IDL, IDB, VD, VG, and ZL formats should use a field width of 8. ISL, ISB, VF, and ZS formats should use a field width of 4. Other field widths will not produce useful results. The maximum field width is 8. No decimal places may be specified.

The default output format is F8.2.

PIBHEX and RBHEX Formats

These are hexadecimal formats, for reading and writing binary formats where each byte has been recoded as a pair of hexadecimal digits.

A hexadecimal field consists solely of hexadecimal digits `0'...`9' and `A'...`F'. Uppercase and lowercase are accepted on input; output is in uppercase.

Other than the hexadecimal representation, these formats are equivalent to PIB and RB formats, respectively. However, bytes in PIBHEX format are always ordered with the most-significant byte first (big-endian order), regardless of the host machine's native byte order or PSPP settings.

Field widths must be even and between 2 and 16. RBHEX format allows no decimal places; PIBHEX allows as many decimal places as a PIB format with half the given width.


Next: , Previous: Binary and Hexadecimal Numeric Formats, Up: Input and Output Formats
4.6.4.5 Time and Date Formats

In PSPP, a time is an interval. The time formats translate between human-friendly descriptions of time intervals and PSPP's internal representation of time intervals, which is simply the number of seconds in the interval. PSPP has two time formats:

Time Format Template Example
TIME hh:MM:SS.ss 04:31:17.01
DTIME DD HH:MM:SS.ss 00 04:31:17.01

A date is a moment in the past or the future. Internally, PSPP represents a date as the number of seconds since the epoch, midnight, Oct. 14, 1582. The date formats translate between human-readable dates and PSPP's numeric representation of dates and times. PSPP has several date formats:

Date Format Template Example
DATE dd-mmm-yyyy 01-OCT-1978
ADATE mm/dd/yyyy 10/01/1978
EDATE dd.mm.yyyy 01.10.1978
JDATE yyyyjjj 1978274
SDATE yyyy/mm/dd 1978/10/01
QYR q Q yyyy 3 Q 1978
MOYR mmm yyyy OCT 1978
WKYR ww WK yyyy 40 WK 1978
DATETIME dd-mmm-yyyy HH:MM:SS.ss 01-OCT-1978 04:31:17.01

The templates in the preceding tables describe how the time and date formats are input and output:

dd
Day of month, from 1 to 31. Always output as two digits.
mm
mmm
Month. In output, mm is output as two digits, mmm as the first three letters of an English month name (January, February, ...). In input, both of these formats, plus Roman numerals, are accepted.
yyyy
Year. In output, DATETIME always produces a 4-digit year; other formats can produce a 2- or 4-digit year. The century assumed for 2-digit years depends on the EPOCH setting (see SET EPOCH). In output, a year outside the epoch causes the whole field to be filled with asterisks (`*').
jjj
Day of year (Julian day), from 1 to 366. This is exactly three digits giving the count of days from the start of the year. January 1 is considered day 1.
q
Quarter of year, from 1 to 4. Quarters start on January 1, April 1, July 1, and October 1.
ww
Week of year, from 1 to 53. Output as exactly two digits. January 1 is the first day of week 1.
DD
Count of days, which may be positive or negative. Output as at least two digits.
hh
Count of hours, which may be positive or negative. Output as at least two digits.
HH
Hour of day, from 0 to 23. Output as exactly two digits.
MM
Minute of hour, from 0 to 59. Output as exactly two digits.
SS.ss
Seconds within minute, from 0 to 59. The integer part is output as exactly two digits. On output, seconds and fractional seconds may or may not be included, depending on field width and decimal places. On input, seconds and fractional seconds are optional. The DECIMAL setting controls the character accepted and displayed as the decimal point (see SET DECIMAL).

For output, the date and time formats use the delimiters indicated in the table. For input, date components may be separated by spaces or by one of the characters `-', `/', `.', or `,', and time components may be separated by spaces, `:', or `.'. On input, the `Q' separating quarter from year and the `WK' separating week from year may be uppercase or lowercase, and the spaces around them are optional.

On input, all time and date formats accept any amount of leading and trailing white space.

The maximum width for time and date formats is 40 columns. Minimum input and output width for each of the time and date formats is shown below:

Format Min. Input Width Min. Output Width Option
DATE 8 9 4-digit year
ADATE 8 8 4-digit year
EDATE 8 8 4-digit year
JDATE 5 5 4-digit year
SDATE 8 8 4-digit year
QYR 4 6 4-digit year
MOYR 6 6 4-digit year
WKYR 6 8 4-digit year
DATETIME 17 17 seconds
TIME 5 5 seconds
DTIME 8 8 seconds

In the table, “Option” describes what increased output width enables:
4-digit year
A field 2 columns wider than minimum will include a 4-digit year. (DATETIME format always includes a 4-digit year.)
seconds
A field 3 columns wider than minimum will include seconds as well as minutes. A field 5 columns wider than minimum, or more, can also include a decimal point and fractional seconds (but no more than allowed by the format's decimal places).

For the time and date formats, the default output format is the same as the input format, except that PSPP increases the field width, if necessary, to the minimum allowed for output.

Time or dates narrower than the field width are right-justified within the field.

When a time or date exceeds the field width, characters are trimmed from the end until it fits. This can occur in an unusual situation, e.g. with a year greater than 9999 (which adds an extra digit), or for a negative value on TIME or DTIME (which adds a leading minus sign).

The system-missing value is output as a period at the right end of the field.


Next: , Previous: Time and Date Formats, Up: Input and Output Formats
4.6.4.6 Date Component Formats

The WKDAY and MONTH formats provide input and output for the names of weekdays and months, respectively.

On output, these formats convert a number between 1 and 7, for WKDAY, or between 1 and 12, for MONTH, into the English name of a day or month, respectively. If the name is longer than the field, it is trimmed to fit. If the name is shorter than the field, it is padded on the right with spaces. Values outside the valid range, and the system-missing value, are output as all spaces.

On input, English weekday or month names (in uppercase or lowercase) are converted back to their corresponding numbers. Weekday and month names may be abbreviated to their first 2 or 3 letters, respectively.

The field width may range from 2 to 40, for WKDAY, or from 3 to 40, for MONTH. No decimal places are allowed.

The default output format is the same as the input format.


Previous: Date Component Formats, Up: Input and Output Formats
4.6.4.7 String Formats

The A and AHEX formats are the only ones that may be assigned to string variables. Neither format allows any decimal places.

In A format, the entire field is treated as a string value. The field width may range from 1 to 32,767, the maximum string width. The default output format is the same as the input format.

In AHEX format, the field is composed of characters in a string encoded as hex digit pairs. On output, hex digits are output in uppercase; on input, uppercase and lowercase are both accepted. The default output format is A format with half the input width.


Previous: Input and Output Formats, Up: Variables

4.6.5 Scratch Variables

Most of the time, variables don't retain their values between cases. Instead, either they're being read from a data file or the active file, in which case they assume the value read, or, if created with COMPUTE or another transformation, they're initialized to the system-missing value or to blanks, depending on type.

However, sometimes it's useful to have a variable that keeps its value between cases. You can do this with LEAVE (see LEAVE), or you can use a scratch variable. Scratch variables are variables whose names begin with an octothorpe (`#').

Scratch variables have the same properties as variables left with LEAVE: they retain their values between cases, and for the first case they are initialized to 0 or blanks. They have the additional property that they are deleted before the execution of any procedure. For this reason, scratch variables can't be used for analysis. To use a scratch variable in an analysis, use COMPUTE (see COMPUTE) to copy its value into an ordinary variable, then use that ordinary variable in the analysis.


Next: , Previous: Variables, Up: Language

4.7 Files Used by PSPP

PSPP makes use of many files each time it runs. Some of these it reads, some it writes, some it creates. Here is a table listing the most important of these files:

command file
syntax file
These names (synonyms) refer to the file that contains instructions that tell PSPP what to do. The syntax file's name is specified on the PSPP command line. Syntax files can also be read with INCLUDE (see INCLUDE).


data file
Data files contain raw data in text or binary format. Data can also be embedded in a syntax file with BEGIN DATA and END DATA.


listing file
One or more output files are created by PSPP each time it is run. The output files receive the tables and charts produced by statistical procedures. The output files may be in any number of formats, depending on how PSPP is configured.


active file
The active file is the “file” on which all PSPP procedures are performed. The active file consists of a dictionary and a set of cases. The active file is not necessarily a disk file: it is stored in memory if there is room.


system file
System files are binary files that store a dictionary and a set of cases. GET and SAVE read and write system files.


portable file
Portable files are files in a text-based format that store a dictionary and a set of cases. IMPORT and EXPORT read and write portable files.


scratch file
Scratch files consist of a dictionary and cases and may be stored in memory or on disk. Most procedures that act on a system file or portable file can use a scratch file instead. The contents of scratch files persist within a single PSPP session only. GET and SAVE can be used to read and write scratch files. Scratch files are a PSPP extension.


Next: , Previous: Files, Up: Language

4.8 File Handles

A file handle is a reference to a data file, system file, portable file, or scratch file. Most often, a file handle is specified as the name of a file as a string, that is, enclosed within `'' or `"'.

A file name string that begins or ends with `|' is treated as the name of a command to pipe data to or from. You can use this feature to read data over the network using a program such as `curl' (e.g. GET '|curl -s -S http://example.com/mydata.sav'), to read compressed data from a file using a program such as `zcat' (e.g. GET '|zcat mydata.sav.gz'), and for many other purposes.

PSPP also supports declaring named file handles with the FILE HANDLE command. This command associates an identifier of your choice (the file handle's name) with a file. Later, the file handle name can be substituted for the name of the file. When PSPP syntax accesses a file multiple times, declaring a named file handle simplifies updating the syntax later to use a different file. Use of FILE HANDLE is also required to read data files in binary formats. See FILE HANDLE, for more information.

PSPP assumes that a file handle name that begins with `#' refers to a scratch file, unless the name has already been declared on FILE HANDLE to refer to another kind of file. A scratch file is similar to a system file, except that it persists only for the duration of a given PSPP session. Most commands that read or write a system or portable file, such as GET and SAVE, also accept scratch file handles. Scratch file handles may also be declared explicitly with FILE HANDLE. Scratch files are a PSPP extension.

In some circumstances, PSPP must distinguish whether a file handle refers to a system file or a portable file. When this is necessary to read a file, e.g. as an input file for GET or MATCH FILES, PSPP uses the file's contents to decide. In the context of writing a file, e.g. as an output file for SAVE or AGGREGATE, PSPP decides based on the file's name: if it ends in `.por' (with any capitalization), then PSPP writes a portable file; otherwise, PSPP writes a system file.

INLINE is reserved as a file handle name. It refers to the “data file” embedded into the syntax file between BEGIN DATA and END DATA. See BEGIN DATA, for more information.

The file to which a file handle refers may be reassigned on a later FILE HANDLE command if it is first closed using CLOSE FILE HANDLE. The CLOSE FILE HANDLE command is also useful to free the storage associated with a scratch file. See CLOSE FILE HANDLE, for more information.


Previous: File Handles, Up: Language

4.9 Backus-Naur Form

The syntax of some parts of the PSPP language is presented in this manual using the formalism known as Backus-Naur Form, or BNF. The following table describes BNF:


Next: , Previous: Language, Up: Top

5 Mathematical Expressions

Expressions share a common syntax each place they appear in PSPP commands. Expressions are made up of operands, which can be numbers, strings, or variable names, separated by operators. There are five types of operators: grouping, arithmetic, logical, relational, and functions.

Every operator takes one or more operands as input and yields exactly one result as output. Depending on the operator, operands accept strings or numbers as operands. With few exceptions, operands may be full-fledged expressions in themselves.


Next: , Up: Expressions

5.1 Boolean Values

Some PSPP operators and expressions work with Boolean values, which represent true/false conditions. Booleans have only three possible values: 0 (false), 1 (true), and system-missing (unknown). System-missing is neither true nor false and indicates that the true value is unknown.

Boolean-typed operands or function arguments must take on one of these three values. Other values are considered false, but provoke a warning when the expression is evaluated.

Strings and Booleans are not compatible, and neither may be used in place of the other.


Next: , Previous: Boolean Values, Up: Expressions

5.2 Missing Values in Expressions

Most numeric operators yield system-missing when given any system-missing operand. A string operator given any system-missing operand typically results in the empty string. Exceptions are listed under particular operator descriptions.

String user-missing values are not treated specially in expressions.

User-missing values for numeric variables are always transformed into the system-missing value, except inside the arguments to the VALUE and SYSMIS functions.

The missing-value functions can be used to precisely control how missing values are treated in expressions. See Missing Value Functions, for more details.


Next: , Previous: Missing Values in Expressions, Up: Expressions

5.3 Grouping Operators

Parentheses (`()') are the grouping operators. Surround an expression with parentheses to force early evaluation.

Parentheses also surround the arguments to functions, but in that situation they act as punctuators, not as operators.


Next: , Previous: Grouping Operators, Up: Expressions

5.4 Arithmetic Operators

The arithmetic operators take numeric operands and produce numeric results.

a + b
Yields the sum of a and b.


a - b
Subtracts b from a and yields the difference.


a * b
Yields the product of a and b. If either a or b is 0, then the result is 0, even if the other operand is missing.


a / b
Divides a by b and yields the quotient. If a is 0, then the result is 0, even if b is missing. If b is zero, the result is system-missing.


a ** b
Yields the result of raising a to the power b. If a is negative and b is not an integer, the result is system-missing. The result of 0**0 is system-missing as well.


- a
Reverses the sign of a.


Next: , Previous: Arithmetic Operators, Up: Expressions

5.5 Logical Operators

The logical operators take logical operands and produce logical results, meaning “true or false.” Logical operators are not true Boolean operators because they may also result in a system-missing value. See Boolean Values, for more information.

a AND b
a & b
True if both a and b are true, false otherwise. If one operand is false, the result is false even if the other is missing. If both operands are missing, the result is missing.


a OR b
a | b
True if at least one of a and b is true. If one operand is true, the result is true even if the other operand is missing. If both operands are missing, the result is missing.


NOT a
~ a
True if a is false. If the operand is missing, then the result is missing.


Next: , Previous: Logical Operators, Up: Expressions

5.6 Relational Operators

The relational operators take numeric or string operands and produce Boolean results.

Strings cannot be compared to numbers. When strings of different lengths are compared, the shorter string is right-padded with spaces to match the length of the longer string.

The results of string comparisons, other than tests for equality or inequality, depend on the character set in use. String comparisons are case-sensitive.

a EQ b
a = b
True if a is equal to b.


a LE b
a <= b
True if a is less than or equal to b.


a LT b
a < b
True if a is less than b.


a GE b
a >= b
True if a is greater than or equal to b.


a GT b
a > b
True if a is greater than b.


a NE b
a ~= b
a <> b
True if a is not equal to b.


Next: , Previous: Relational Operators, Up: Expressions

5.7 Functions

PSPP functions provide mathematical abilities above and beyond those possible using simple operators. Functions have a common syntax: each is composed of a function name followed by a left parenthesis, one or more arguments, and a right parenthesis.

Function names are not reserved. Their names are specially treated only when followed by a left parenthesis, so that EXP(10) refers to the constant value e raised to the 10th power, but EXP by itself refers to the value of variable EXP.

The sections below describe each function in detail.


Next: , Up: Functions

5.7.1 Mathematical Functions

Advanced mathematical functions take numeric arguments and produce numeric results.

— Function: EXP (exponent)

Returns e (approximately 2.71828) raised to power exponent.

— Function: LG10 (number)

Takes the base-10 logarithm of number. If number is not positive, the result is system-missing.

— Function: LN (number)

Takes the base-e logarithm of number. If number is not positive, the result is system-missing.

— Function: LNGAMMA (number)

Yields the base-e logarithm of the complete gamma of number. If number is a negative integer, the result is system-missing.

— Function: SQRT (number)

Takes the square root of number. If number is negative, the result is system-missing.


Next: , Previous: Mathematics, Up: Functions

5.7.2 Miscellaneous Mathematical Functions

Miscellaneous mathematical functions take numeric arguments and produce numeric results.

— Function: ABS (number)

Results in the absolute value of number.

— Function: MOD (numerator, denominator)

Returns the remainder (modulus) of numerator divided by denominator. If numerator is 0, then the result is 0, even if denominator is missing. If denominator is 0, the result is system-missing.

— Function: MOD10 (number)

Returns the remainder when number is divided by 10. If number is negative, MOD10(number) is negative or zero.

— Function: RND (number)

Takes the absolute value of number and rounds it to an integer. Then, if number was negative originally, negates the result.

— Function: TRUNC (number)

Discards the fractional part of number; that is, rounds number towards zero.


Next: , Previous: Miscellaneous Mathematics, Up: Functions

5.7.3 Trigonometric Functions

Trigonometric functions take numeric arguments and produce numeric results.

— Function: ARCOS (number)
— Function: ACOS (number)

Takes the arccosine, in radians, of number. Results in system-missing if number is not between -1 and 1 inclusive. This function is a PSPP extension.

— Function: ARSIN (number)
— Function: ASIN (number)

Takes the arcsine, in radians, of number. Results in system-missing if