Data from real sources is rarely error free. PSPP has a number of procedures which can be used to help identify data which might be incorrect.

The `DESCRIPTIVES`

command (see DESCRIPTIVES) is used to generate
simple linear statistics for a dataset. It is also useful for
identifying potential problems in the data.
The example file `physiology.sav` contains a number of physiological
measurements of a sample of healthy adults selected at random.
However, the data entry clerk made a number of mistakes when entering
the data.
Example 5.2 illustrates the use of `DESCRIPTIVES`

to screen this
data and identify the erroneous values.

PSPP> get file='/usr/local/share/pspp/examples/physiology.sav'. PSPP> descriptives sex, weight, height. Output: DESCRIPTIVES. Valid cases = 40; cases with missing value(s) = 0. +--------#--+-------+-------+-------+-------+ |Variable# N| Mean |Std Dev|Minimum|Maximum| #========#==#=======#=======#=======#=======# |sex #40| .45| .50| .00| 1.00| |height #40|1677.12| 262.87| 179.00|1903.00| |weight #40| 72.12| 26.70| -55.60| 92.07| +--------#--+-------+-------+-------+-------+ |

In the output of Example 5.2,
the most interesting column is the minimum value.
The `weight` variable has a minimum value of less than zero,
which is clearly erroneous.
Similarly, the `height` variable’s minimum value seems to be very low.
In fact, it is more than 5 standard deviations from the mean, and is a
seemingly bizarre height for an adult person.
We can examine the data in more detail with the `EXAMINE`

command (see EXAMINE):

In Example 5.3 you can see that the lowest value of `height` is
179 (which we suspect to be erroneous), but the second lowest is 1598
which
we know from the `DESCRIPTIVES`

command
is within 1 standard deviation from the mean.
Similarly the `weight` variable has a lowest value which is
negative but a plausible value for the second lowest value.
This suggests that the two extreme values are outliers and probably
represent data entry errors.

[… continue from Example 5.2] ```
PSPP> examine height, weight /statistics=extreme(3).
``` Output: #===============================#===========#=======# # #Case Number| Value # #===============================#===========#=======# #Height in millimetres Highest 1# 14|1903.00# # 2# 15|1884.00# # 3# 12|1801.65# # ----------#-----------+-------# # Lowest 1# 30| 179.00# # 2# 31|1598.00# # 3# 28|1601.00# # ----------#-----------+-------# #Weight in kilograms Highest 1# 13| 92.07# # 2# 5| 92.07# # 3# 17| 91.74# # ----------#-----------+-------# # Lowest 1# 38| -55.60# # 2# 39| 54.48# # 3# 33| 55.45# #===============================#===========#=======# |