q2c Input Format
This manual is for GNU PSPP version 0.4.3, software for statistical analysis.
Copyright © 1997, 1998, 2004, 2005 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover Texts being “A GNU Manual,” and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled “GNU Free Documentation License.”(a) The FSF's Back-Cover Text is: “You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development.”
PSPP is a tool for statistical analysis of sampled data. It reads a syntax file and a data file, analyzes the data, and writes the results to a listing file or to standard output.
The language accepted by PSPP is similar to those accepted by SPSS statistical products. The details of PSPP's language are given later in this manual.
PSPP produces output in two forms: tables and charts. Both of these can be written in several formats; currently, ASCII, PostScript, and HTML are supported. In the future, more drivers, such as PCL and X Window System drivers, may be developed. For now, Ghostscript, available from the Free Software Foundation, may be used to convert PostScript chart output to other formats.
The current version of PSPP, 0.4.3, is woefully incomplete in terms of its statistical procedure support. PSPP is a work in progress. The author hopes to support fully support all features in the products that PSPP replaces, eventually. The author welcomes questions, comments, donations, and code submissions. See Submitting Bug Reports, for instructions on contacting the author.
PSPP is not in the public domain. It is copyrighted and there are restrictions on its distribution, but these restrictions are designed to permit everything that a good cooperating citizen would want to do. What is not allowed is to try to prevent others from further sharing any version of this program that they might get from you.
Specifically, we want to make sure that you have the right to give away copies of PSPP, that you receive source code or else can get it if you want it, that you can change these programs or use pieces of them in new free programs, and that you know you can do these things.
To make sure that everyone has such rights, we have to forbid you to deprive anyone else of these rights. For example, if you distribute copies of PSPP, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights.
Also, for our own protection, we must make certain that everyone finds out that there is no warranty for PSPP. If these programs are modified by someone else and passed on, we want their recipients to know that what they have is not what we distributed, so that any problems introduced by others will not reflect on our reputation.
Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all.
The precise conditions of the license for PSPP are found in the GNU General Public License. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. This manual specifically is covered by the GNU Free Documentation License (see GNU Free Documentation License).
pspp [ -B dir | --config-dir=dir ] [ -o device | --device=device ]
[ -d var[=value] | --define=var[=value] ] [-u var | --undef=var ]
[ -f file | --out-file=file ] [ -p | --pipe ] [ -I- | --no-include ]
[ -I dir | --include=dir ] [ -i | --interactive ]
[ -n | --edit | --dry-run | --just-print | --recon ]
[ -r | --no-statrc ] [ -h | --help ] [ -l | --list ]
[ -c command | --command command ] [ -s | --safer ]
[ --testing-mode ] [ -V | --version ] [ -v | --verbose ]
[ key=value ] file....
Syntax files and output device substitutions can be specified on PSPP's command line:
-i or
--interactive option is given (see Language control options).
=valueThere is one other way to specify a syntax file, if your operating system supports it. If you have a syntax file foobar.stat, put the notation
#! /usr/local/bin/pspp
at the top, and mark the file as executable with chmod +x
foobar.stat. (If PSPP is not installed in /usr/local/bin,
then insert its actual installation directory into the syntax file
instead.) Now you should be able to invoke the syntax file just by
typing its name. You can include any options on the command line as
usual. PSPP entirely ignores any lines beginning with `#!'.
Configuration options are used to change PSPP's configuration for the current run. The configuration options are:
-a {compatible|enhanced}--algorithm={compatible|enhanced}compatible, then PSPP will use the same algorithms
as used by some proprietary statistical analysis packages.
This is not recommended, as these algorithms are inferior and in some cases
compeletely broken.
The default setting is enhanced.
Certain commands have subcommands which allow you to override this setting on
a per command basis.
-B dir--config-dir=dir-o device--device=deviceInput and output options affect how PSPP reads input and writes output. These are the input and output options:
-f file--out-file=file-p--pipe-f file and --file=file options.
-I---no-include-I dir--include=dir-c command--command=command--testing-modemake
check and similar scripts.
Language control options control how PSPP syntax files are parsed and interpreted. The available language control options are:
-i--interactiveIn addition, this forces syntax files to be interpreted in interactive
mode, rather than the default batch mode. See Tokenizing lines, for
information on the differences between batch mode and interactive mode
command interpretation.
-n--edit--dry-run--just-print--recon-r--no-statrc-s--saferInformational options cause information about PSPP to be written to the terminal. Here are the available options:
-h--help-l--list-x {compatible|enhanced}--syntax={compatible|enhanced}compatible, then PSPP will only accept command syntax that
is compatible with the proprietary program SPSS.
If you choose enhanced then additional syntax will be available.
The default is enhanced.
-V--version-v--verboseThis option can be given multiple times to set the verbosity level to that value. The default verbosity level is 0, in which no informational messages will be displayed.
Higher verbosity levels cause messages to be displayed when the corresponding events take place.
Each verbosity level also includes messages from lower verbosity levels.
Please note: PSPP is not even close to completion. Only a few statistical procedures are implemented. PSPP is a work in progress.
This chapter discusses elements common to many PSPP commands. Later chapters will describe individual commands in detail.
PSPP divides most syntax file lines into series of short chunks called tokens. Tokens are then grouped to form commands, each of which tells PSPP to take some action—read in data, write out data, perform a statistical procedure, etc. Each type of token is described below.
. _ $ # @
Identifiers may be any length, but only the first 64 bytes are
significant. Identifiers are not case-sensitive: foobar,
Foobar, FooBar, FOOBAR, and FoObaR are
different representations of the same identifier.
Some identifiers are reserved. Reserved identifiers may not be used in any context besides those explicitly described in this manual. The reserved identifiers are:
ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
Reserved identifiers are always used as keywords. Other identifiers
may be used both as keywords and as user-defined identifiers, such as
variable names.
-5 3.14159265359 1e100 -.707 8945.
Negative numbers are expressed with a `-' prefix. However, in situations where a literal `-' token is expected, what appears to be a negative number is treated as `-' followed by a positive number.
No white space is allowed within a number token, except for horizontal white space between `-' and the rest of the number.
The last example above, `8945.' will be interpreted as two
tokens, `8945' and `.', if it is the last token on a line.
See Forming commands of tokens.
Strings can be concatenated using `+', so that `"a" + 'b' + 'c'' is equivalent to `'abc''. Concatenation is useful for splitting a single string across multiple source lines. The maximum length of a string, after concatenation, is 255 characters.
Strings may also be expressed as hexadecimal, octal, or binary character values by prefixing the initial quote character by `X', `O', or `B' or their lowercase equivalents. Each pair, triplet, or octet of characters, according to the radix, is transformed into a single character with the given value. If there is an incomplete group of characters, the missing final digits are assumed to be `0'. These forms of strings are nonportable because numeric values are associated with different characters by different operating systems. Therefore, their use should be confined to syntax files that will not be widely distributed.
The character with value 00 is reserved for
internal use by PSPP. Its use in strings causes an error and
replacement by a space character.
, / = ( ) + - * / ** < <= <> > >= ~= & | .
Most of these appear within the syntax of commands, but the period (`.') punctuator is used only at the end of a command. It is a punctuator only as the last character on a line (except white space). When it is the last non-space character on a line, a period is not treated as part of another token, even if it would otherwise be part of, e.g., an identifier or a floating-point number.
Actually, the character that ends a command can be changed with SET's ENDCMD subcommand (see SET), but we do not recommend doing so. Throughout the remainder of this manual we will assume that the default setting is in effect.
Most PSPP commands share a common structure. A command begins with a command name, such as FREQUENCIES, DATA LIST, or N OF CASES. The command name may be abbreviated to its first word, and each word in the command name may be abbreviated to its first three or more characters, where these abbreviations are unambiguous.
The command name may be followed by one or more subcommands. Each subcommand begins with a subcommand name, which may be abbreviated to its first three letters. Some subcommands accept a series of one or more specifications, which follow the subcommand name, optionally separated from it by an equals sign (`='). Specifications may be separated from each other by commas or spaces. Each subcommand must be separated from the next (if any) by a forward slash (`/').
There are multiple ways to mark the end of a command. The most common way is to end the last line of the command with a period (`.') as described in the previous section (see Tokens). A blank line, or one that consists only of white space or comments, also ends a command by default, although you can use the NULLINE subcommand of SET to disable this feature (see SET).
In batch mode only, that is, when reading commands from a file instead of an interactive user, any line that contains a non-space character in the leftmost column begins a new command. Thus, each command consists of a flush-left line followed by any number of lines indented from the left margin. In this mode, a plus or minus sign (`+', `−') as the first character in a line is ignored and causes that line to begin a new command, which allows for visual indentation of a command without that command being considered part of the previous command.
Commands in PSPP are divided roughly into six categories:
PSPP does not place many restrictions on ordering of commands. The main restriction is that variables must be defined before they are otherwise referenced. This section describes the details of command ordering, but most users will have no need to refer to them.
PSPP possesses five internal states, called initial, INPUT PROGRAM, FILE TYPE, transformation, and procedure states. (Please note the distinction between the INPUT PROGRAM and FILE TYPE commands and the INPUT PROGRAM and FILE TYPE states.)
PSPP starts in the initial state. Each successful completion of a command may cause a state transition. Each type of command has its own rules for state transitions:
PSPP includes special support for unknown numeric data values. Missing observations are assigned a special value, called the system-missing value. This “value” actually indicates the absence of a value; it means that the actual value is unknown. Procedures automatically exclude from analyses those observations or cases that have missing values. Details of missing value exclusion depend on the procedure and can often be controlled by the user; refer to descriptions of individual procedures for details.
The system-missing value exists only for numeric variables. String variables always have a defined value, even if it is only a string of spaces.
Variables, whether numeric or string, can have designated user-missing values. Every user-missing value is an actual value for that variable. However, most of the time user-missing values are treated in the same way as the system-missing value. String variables that are wider than a certain width, usually 8 characters (depending on computer architecture), cannot have user-missing values.
For more information on missing values, see the following sections: Variables, MISSING VALUES, Expressions. See also the documentation on individual procedures for information on how they handle missing values.
Variables are the basic unit of data storage in PSPP. All the variables in a file taken together, apart from any associated data, are said to form a dictionary. Some details of variables are described in the sections below.
Each variable has a number of attributes, including:
Some system variable names begin with `$', but user-defined variables' names may not begin with `$'.
The final character in a variable name should not be `.', because
such an identifier will be misinterpreted when it is the final token
on a line: FOO. will be divided into two separate tokens,
`FOO' and `.', indicating end-of-command. See Tokens.
The final character in a variable name should not be `_', because some such identifiers are used for special purposes by PSPP procedures.
As with all PSPP identifiers, variable names are not case-sensitive. PSPP capitalizes variable names on output the same way they were capitalized at their point of definition in the input.
Certain systems may consider strings longer than 8
characters to be short strings. Eight characters represents a minimum
figure for the maximum length of a short string.
There are seven system variables. These are not like ordinary variables because system variables are not always stored. They can be used only in expressions. These system variables, whose values and output formats cannot be modified, are described below.
$CASENUM$DATEDD MMM YY.
$JDATE$LENGTH$SYSMIS$TIME$WIDTH
To refer to a set of variables, list their names one after another.
Optionally, their names may be separated by commas. To include a
range of variables from the dictionary in the list, write the name of
the first and last variable in the range, separated by TO. For
instance, if the dictionary contains six variables with the names
ID, X1, X2, GOAL, MET, and
NEXTGOAL, in that order, then X2 TO MET would include
variables X2, GOAL, and MET.
Commands that define variables, such as DATA LIST, give
TO an alternate meaning. With these commands, TO define
sequences of variables whose names end in consecutive integers. The
syntax is two identifiers that begin with the same root and end with
numbers, separated by TO. The syntax X1 TO X5 defines 5
variables, named X1, X2, X3, X4, and
X5. The syntax ITEM0008 TO ITEM0013 defines 6
variables, named ITEM0008, ITEM0009, ITEM0010,
ITEM0011, ITEM0012, and ITEM00013. The syntaxes
QUES001 TO QUES9 and QUES6 TO QUES3 are invalid.
After a set of variables has been defined with DATA LIST or another command with this method, the same set can be referenced on later commands using the same syntax.
An input format describes how to interpret the contents of an input field as a number or a string. It might specify that the field contains an ordinary decimal number, a time or date, a number in binary or hexadecimal notation, or one of several other notations. Input formats are used by commands such as DATA LIST that read data or syntax files into the PSPP active file.
Every input format corresponds to a default output format that specifies the formatting used when the value is output later. It is always possible to explicitly specify an output format that resembles the input format. Usually, this is the default, but in cases where the input format is unfriendly to human readability, such as binary or hexadecimal formats, the default output format is an easier-to-read decimal format.
Every variable has two output formats, called its print format and write format. Print formats are used in most output contexts; write formats are used only by WRITE (see WRITE). Newly created variables have identical print and write formats, and FORMATS, the most commonly used command for changing formats (see FORMATS), sets both of them to the same value as well. Thus, most of the time, the distinction between print and write formats is unimportant.
Input and output formats are specified to PSPP with a format
specification of the form TYPEw or TYPEw.d, where
TYPE is one of the format types described later, w is a
field width measured in columns, and d is an optional number of
decimal places. If d is omitted, a value of 0 is assumed. Some
formats do not allow a nonzero d to be specified.
The following sections describe the input and output formats supported by PSPP.
The basic numeric formats are used for input and output of real numbers in standard or scientific notation. The following table shows an example of how each format displays positive and negative numbers with the default decimal point setting:
| Format | 3141.59 | -3141.59
|
|---|---|---|
| F8.2 | 3141.59 | -3141.59
|
| COMMA9.2 | 3,141.59 | -3,141.59
|
| DOT9.2 | 3.141,59 | -3.141,59
|
| DOLLAR10.2 | $3,141.59 | -$3,141.59
|
| PCT9.2 | 3141.59% | -3141.59%
|
| E8.1 | 3.1E+003 | -3.1E+003
|
On output, numbers in F format are expressed in standard decimal notation with the requested number of decimal places. The other formats output some variation on this style:
On input, the basic numeric formats accept positive and numbers in standard decimal notation or scientific notation. Leading and trailing spaces are allowed. An empty or all-spaces field, or one that contains only a single period, is treated as the system missing value.
In scientific notation, the exponent may be introduced by a sign (`+' or `-'), or by one of the letters `e' or `d' (in uppercase or lowercase), or by a letter followed by a sign. A single space may follow the letter or the sign or both.
On fixed-format DATA LIST (see DATA LIST FIXED) and in a few
other contexts, decimals are implied when the field does not contain a
decimal point. In F6.5 format, for example, the field 314159 is
taken as the value 3.14159 with implied decimals. Decimals are never
implied if an explicit decimal point is present or if scientific
notation is used.
E and F formats accept the basic syntax already described. The other formats allow some additional variations:
All of the basic number formats have a maximum field width of 40 and accept no more than 16 decimal places, on both input and output. Some additional restrictions apply:
More details of basic numeric output formatting are given below:
3 in F1.0 format, and -1.125 as -1.13 in F5.1
format.
In scientific notation, the number always includes a decimal point, even if it is not followed by a digit.
-$9.99.
+Infinity or -Infinity for infinities, from
NaN for “not a number,” or from Unknown for other values
(if any are supported by the system). In fields under 3 columns wide,
special values are output as asterisks.
The custom currency formats are closely related to the basic numeric formats, but they allow users to customize the output format. The SET command configures custom currency formats, using the syntax
SET CCx="string".
where x is A, B, C, D, or E, and string is no more than 16 characters long.
string must contain exactly three commas or exactly three periods (but not both), except that a single quote character may be used to “escape” a following comma, period, or single quote. If three commas are used, commas will be used for grouping in output, and a period will be used as the decimal point. Uses of periods reverses these roles.
The commas or periods divide string into four fields, called the negative prefix, prefix, suffix, and negative suffix, respectively. The prefix and suffix are added to output whenever space is available. The negative prefix and negative suffix are always added to a negative number when the output includes a nonzero digit.
The following syntax shows how custom currency formats could be used to reproduce basic numeric formats:
SET CCA="-,,,". /* Same as COMMA.
SET CCB="-...". /* Same as DOT.
SET CCC="-,$,,". /* Same as DOLLAR.
SET CCD="-,,%,". /* Like PCT, but groups with commas.
Here are some more examples of custom currency formats. The final example shows how to use a single quote to escape a delimiter:
SET CCA=",EUR,,-". /* Euro.
SET CCB="(,USD ,,)". /* US dollar.
SET CCC="-.R$..". /* Brazilian real.
SET CCD="-,, NIS,". /* Israel shekel.
SET CCE="-.Rp'. ..". /* Indonesia Rupiah.
These formats would yield the following output:
| Format | 3145.59 | -3145.59
|
|---|---|---|
| CCA12.2 | EUR3,145.59 | EUR3,145.59-
|
| CCB14.2 | USD 3,145.59 | (USD 3,145.59)
|
| CCC11.2 | R$3.145,59 | -R$3.145,59
|
| CCD13.2 | 3,145.59 NIS | -3,145.59 NIS
|
| CCE10.0 | Rp. 3.146 | -Rp. 3.146
|
The default for all the custom currency formats is `-,,,', equivalent to COMMA format.
The N and Z numeric formats provide compatibility with legacy file formats. They have much in common:
The N format supports input and output of fields that contain only digits. On input, leading or trailing spaces, a decimal point, or any other non-digit character causes the field to be read as the system-missing value. As a special exception, an N format used on DATA LIST FREE or DATA LIST LIST is treated as the equivalent F format.
On output, N pads the field on the left with zeros. Negative numbers are output like the system-missing value.
The Z format is a “zoned decimal” format used on IBM mainframes. Z format encodes the sign as part of the final digit, which must be one of the following:
0123456789
{ABCDEFGHI
}JKLMNOPQR
where the characters in each row represent digits 0 through 9 in order. Characters in the first two rows indicate a positive sign; those in the third indicate a negative sign.
On output, Z fields are padded on the left with spaces. On input, leading and trailing spaces are ignored. Any character in an input field other than spaces, the digit characters above, and `.' causes the field to be read as system-missing.
The decimal point character for input and output is always `.', even if the decimal point character is a comma (see SET DECIMAL).
Nonzero, negative values output in Z format are marked as negative even when no nonzero digits are output. For example, -0.2 is output in Z1.0 format as `J'. The “negative zero” value supported by most machines is output as positive.
The binary and hexadecimal formats are primarily designed for compatibility with existing machine formats, not for human readability. All of them therefore have a F format as default output format. Some of these formats are only portable between machines with compatible byte ordering (endianness) or floating-point format.
Binary formats use byte values that in text files are interpreted as special control functions, such as carriage return and line feed. Thus, data in binary formats should not be included in syntax files or read from data files with variable-length records, such as ordinary text files. They may be read from or written to data files with fixed-length records. See FILE HANDLE, for information on working with fixed-length records.
These are binary-coded decimal formats, in which every byte (except the last, in P format) represents two decimal digits. The most-significant 4 bits of the first byte is the most-significant decimal digit, the least-significant 4 bits of the first byte is the next decimal digit, and so on.
In P format, the most-significant 4 bits of the last byte are the least-significant decimal digit. The least-significant 4 bits represent the sign: decimal 15 indicates a negative value, decimal 13 indicates a positive value.
Numbers are rounded downward on output. The system-missing value and numbers outside representable range are output as zero.
The maximum field width is 16. Decimal places may range from 0 up to the number of decimal digits represented by the field.
The default output format is an F format with twice the input field width, plus one column for a decimal point (if decimal places were requested).
These are integer binary formats. IB reads and writes 2's complement binary integers, and PIB reads and writes unsigned binary integers. The byte ordering is by default the host machine's, but SET RIB may be used to select a specific byte ordering for reading (see SET RIB) and SET WIB, similarly, for writing (see SET WIB).
The maximum field width is 8. Decimal places may range from 0 up to the number of decimal digits in the largest value representable in the field width.
The default output format is an F format whose width is the number of decimal digits in the largest value representable in the field width, plus 1 if the format has decimal places.
This is a binary format for real numbers. By default it reads and writes the host machine's floating-point format, but SET RRB may be used to select an alternate floating-point format for reading (see SET RRB) and SET WRB, similarly, for writing (see SET WRB).
The recommended field width depends on the floating-point format. NATIVE (the default format), IDL, IDB, VD, VG, and ZL formats should use a field width of 8. ISL, ISB, VF, and ZS formats should use a field width of 4. Other field widths will not produce useful results. The maximum field width is 8. No decimal places may be specified.
The default output format is F8.2.
These are hexadecimal formats, for reading and writing binary formats where each byte has been recoded as a pair of hexadecimal digits.
A hexadecimal field consists solely of hexadecimal digits `0'...`9' and `A'...`F'. Uppercase and lowercase are accepted on input; output is in uppercase.
Other than the hexadecimal representation, these formats are equivalent to PIB and RB formats, respectively. However, bytes in PIBHEX format are always ordered with the most-significant byte first (big-endian order), regardless of the host machine's native byte order or PSPP settings.
Field widths must be even and between 2 and 16. RBHEX format allows no decimal places; PIBHEX allows as many decimal places as a PIB format with half the given width.
In PSPP, a time is an interval. The time formats translate between human-friendly descriptions of time intervals and PSPP's internal representation of time intervals, which is simply the number of seconds in the interval. PSPP has two time formats:
| Time Format | Template | Example
|
|---|---|---|
| TIME | hh:MM:SS.ss | 04:31:17.01
|
| DTIME | DD HH:MM:SS.ss | 00 04:31:17.01
|
A date is a moment in the past or the future. Internally, PSPP represents a date as the number of seconds since the epoch, midnight, Oct. 14, 1582. The date formats translate between human-readable dates and PSPP's numeric representation of dates and times. PSPP has several date formats:
| Date Format | Template | Example
|
|---|---|---|
| DATE | dd-mmm-yyyy | 01-OCT-1978
|
| ADATE | mm/dd/yyyy | 10/01/1978
|
| EDATE | dd.mm.yyyy | 01.10.1978
|
| JDATE | yyyyjjj | 1978274
|
| SDATE | yyyy/mm/dd | 1978/10/01
|
| QYR | q Q yyyy | 3 Q 1978
|
| MOYR | mmm yyyy | OCT 1978
|
| WKYR | ww WK yyyy | 40 WK 1978
|
| DATETIME | dd-mmm-yyyy HH:MM:SS.ss | 01-OCT-1978 04:31:17.01
|
The templates in the preceding tables describe how the time and date formats are input and output:
ddmmmmmmm is output as two digits, mmm as the
first three letters of an English month name (January, February,
...). In input, both of these formats, plus Roman numerals, are
accepted.
yyyyjjjqwwDDhhHHMMSS.ssFor output, the date and time formats use the delimiters indicated in the table. For input, date components may be separated by spaces or by one of the characters `-', `/', `.', or `,', and time components may be separated by spaces, `:', or `.'. On input, the `Q' separating quarter from year and the `WK' separating week from year may be uppercase or lowercase, and the spaces around them are optional.
On input, all time and date formats accept any amount of leading and trailing white space.
The maximum width for time and date formats is 40 columns. Minimum input and output width for each of the time and date formats is shown below:
| Format | Min. Input Width | Min. Output Width | Option
|
|---|---|---|---|
| DATE | 8 | 9 | 4-digit year
|
| ADATE | 8 | 8 | 4-digit year
|
| EDATE | 8 | 8 | 4-digit year
|
| JDATE | 5 | 5 | 4-digit year
|
| SDATE | 8 | 8 | 4-digit year
|
| QYR | 4 | 6 | 4-digit year
|
| MOYR | 6 | 6 | 4-digit year
|
| WKYR | 6 | 8 | 4-digit year
|
| DATETIME | 17 | 17 | seconds
|
| TIME | 5 | 5 | seconds
|
| DTIME | 8 | 8 | seconds
|
For the time and date formats, the default output format is the same as the input format, except that PSPP increases the field width, if necessary, to the minimum allowed for output.
Time or dates narrower than the field width are right-justified within the field.
When a time or date exceeds the field width, characters are trimmed from the end until it fits. This can occur in an unusual situation, e.g. with a year greater than 9999 (which adds an extra digit), or for a negative value on TIME or DTIME (which adds a leading minus sign).
The system-missing value is output as a period at the right end of the field.
The WKDAY and MONTH formats provide input and output for the names of weekdays and months, respectively.
On output, these formats convert a number between 1 and 7, for WKDAY, or between 1 and 12, for MONTH, into the English name of a day or month, respectively. If the name is longer than the field, it is trimmed to fit. If the name is shorter than the field, it is padded on the right with spaces. Values outside the valid range, and the system-missing value, are output as all spaces.
On input, English weekday or month names (in uppercase or lowercase) are converted back to their corresponding numbers. Weekday and month names may be abbreviated to their first 2 or 3 letters, respectively.
The field width may range from 2 to 40, for WKDAY, or from 3 to 40, for MONTH. No decimal places are allowed.
The default output format is the same as the input format.
The A and AHEX formats are the only ones that may be assigned to string variables. Neither format allows any decimal places.
In A format, the entire field is treated as a string value. The field width may range from 1 to 32,767, the maximum string width. The default output format is the same as the input format.
In AHEX format, the field is composed of characters in a string encoded as hex digit pairs. On output, hex digits are output in uppercase; on input, uppercase and lowercase are both accepted. The default output format is A format with half the input width.
Most of the time, variables don't retain their values between cases. Instead, either they're being read from a data file or the active file, in which case they assume the value read, or, if created with COMPUTE or another transformation, they're initialized to the system-missing value or to blanks, depending on type.
However, sometimes it's useful to have a variable that keeps its value between cases. You can do this with LEAVE (see LEAVE), or you can use a scratch variable. Scratch variables are variables whose names begin with an octothorpe (`#').
Scratch variables have the same properties as variables left with LEAVE: they retain their values between cases, and for the first case they are initialized to 0 or blanks. They have the additional property that they are deleted before the execution of any procedure. For this reason, scratch variables can't be used for analysis. To use a scratch variable in an analysis, use COMPUTE (see COMPUTE) to copy its value into an ordinary variable, then use that ordinary variable in the analysis.
PSPP makes use of many files each time it runs. Some of these it reads, some it writes, some it creates. Here is a table listing the most important of these files:
A file handle is a reference to a data file, system file, portable file, or scratch file. Most often, a file handle is specified as the name of a file as a string, that is, enclosed within `'' or `"'.
A file name string that begins or ends with `|' is treated as the
name of a command to pipe data to or from. You can use this feature
to read data over the network using a program such as `curl'
(e.g. GET '|curl -s -S http://example.com/mydata.sav'), to
read compressed data from a file using a program such as `zcat'
(e.g. GET '|zcat mydata.sav.gz'), and for many other
purposes.
PSPP also supports declaring named file handles with the FILE HANDLE command. This command associates an identifier of your choice (the file handle's name) with a file. Later, the file handle name can be substituted for the name of the file. When PSPP syntax accesses a file multiple times, declaring a named file handle simplifies updating the syntax later to use a different file. Use of FILE HANDLE is also required to read data files in binary formats. See FILE HANDLE, for more information.
PSPP assumes that a file handle name that begins with `#' refers to a scratch file, unless the name has already been declared on FILE HANDLE to refer to another kind of file. A scratch file is similar to a system file, except that it persists only for the duration of a given PSPP session. Most commands that read or write a system or portable file, such as GET and SAVE, also accept scratch file handles. Scratch file handles may also be declared explicitly with FILE HANDLE. Scratch files are a PSPP extension.
In some circumstances, PSPP must distinguish whether a file handle refers to a system file or a portable file. When this is necessary to read a file, e.g. as an input file for GET or MATCH FILES, PSPP uses the file's contents to decide. In the context of writing a file, e.g. as an output file for SAVE or AGGREGATE, PSPP decides based on the file's name: if it ends in `.por' (with any capitalization), then PSPP writes a portable file; otherwise, PSPP writes a system file.
INLINE is reserved as a file handle name. It refers to the “data file” embedded into the syntax file between BEGIN DATA and END DATA. See BEGIN DATA, for more information.
The file to which a file handle refers may be reassigned on a later FILE HANDLE command if it is first closed using CLOSE FILE HANDLE. The CLOSE FILE HANDLE command is also useful to free the storage associated with a scratch file. See CLOSE FILE HANDLE, for more information.
The syntax of some parts of the PSPP language is presented in this manual using the formalism known as Backus-Naur Form, or BNF. The following table describes BNF:
numberintegerstringvar-name=, /, +, -, etc..var-listALL.
expressionExpressions share a common syntax each place they appear in PSPP commands. Expressions are made up of operands, which can be numbers, strings, or variable names, separated by operators. There are five types of operators: grouping, arithmetic, logical, relational, and functions.
Every operator takes one or more operands as input and yields exactly one result as output. Depending on the operator, operands accept strings or numbers as operands. With few exceptions, operands may be full-fledged expressions in themselves.
Some PSPP operators and expressions work with Boolean values, which represent true/false conditions. Booleans have only three possible values: 0 (false), 1 (true), and system-missing (unknown). System-missing is neither true nor false and indicates that the true value is unknown.
Boolean-typed operands or function arguments must take on one of these three values. Other values are considered false, but provoke a warning when the expression is evaluated.
Strings and Booleans are not compatible, and neither may be used in place of the other.
Most numeric operators yield system-missing when given any system-missing operand. A string operator given any system-missing operand typically results in the empty string. Exceptions are listed under particular operator descriptions.
String user-missing values are not treated specially in expressions.
User-missing values for numeric variables are always transformed into
the system-missing value, except inside the arguments to the
VALUE and SYSMIS functions.
The missing-value functions can be used to precisely control how missing values are treated in expressions. See Missing Value Functions, for more details.
Parentheses (`()') are the grouping operators. Surround an expression with parentheses to force early evaluation.
Parentheses also surround the arguments to functions, but in that situation they act as punctuators, not as operators.
The arithmetic operators take numeric operands and produce numeric results.
+ b - b * b / b ** b0**0 is system-missing as well.
- aThe logical operators take logical operands and produce logical results, meaning “true or false.” Logical operators are not true Boolean operators because they may also result in a system-missing value. See Boolean Values, for more information.
AND b & b OR b | bNOT a~ aThe relational operators take numeric or string operands and produce Boolean results.
Strings cannot be compared to numbers. When strings of different lengths are compared, the shorter string is right-padded with spaces to match the length of the longer string.
The results of string comparisons, other than tests for equality or inequality, depend on the character set in use. String comparisons are case-sensitive.
EQ b = b LE b <= b LT b < b GE b >= b GT b > b NE b ~= b <> bPSPP functions provide mathematical abilities above and beyond those possible using simple operators. Functions have a common syntax: each is composed of a function name followed by a left parenthesis, one or more arguments, and a right parenthesis.
Function names are not reserved. Their names are specially treated
only when followed by a left parenthesis, so that EXP(10)
refers to the constant value e raised to the 10th power, but
EXP by itself refers to the value of variable EXP.
The sections below describe each function in detail.
Advanced mathematical functions take numeric arguments and produce numeric results.
Takes the base-10 logarithm of number. If number is not positive, the result is system-missing.
Takes the base-e logarithm of number. If number is not positive, the result is system-missing.
Yields the base-e logarithm of the complete gamma of number. If number is a negative integer, the result is system-missing.
Takes the square root of number. If number is negative, the result is system-missing.
Miscellaneous mathematical functions take numeric arguments and produce numeric results.
Returns the remainder (modulus) of numerator divided by denominator. If numerator is 0, then the result is 0, even if denominator is missing. If denominator is 0, the result is system-missing.
Returns the remainder when number is divided by 10. If number is negative, MOD10(number) is negative or zero.
Takes the absolute value of number and rounds it to an integer. Then, if number was negative originally, negates the result.
Discards the fractional part of number; that is, rounds number towards zero.
Trigonometric functions take numeric arguments and produce numeric results.