Previous: , Up: Data Screening and Transformation   [Contents][Index]

#### 5.2.5 Testing for normality

Many statistical tests rely upon certain properties of the data. One common property, upon which many linear tests depend, is that of normality — the data must have been drawn from a normal distribution. It is necessary then to ensure normality before deciding upon the test procedure to use. One way to do this uses the `EXAMINE` command.

In Example 5.5, a researcher was examining the failure rates of equipment produced by an engineering company. The file repairs.sav contains the mean time between failures (mtbf) of some items of equipment subject to the study. Before performing linear analysis on the data, the researcher wanted to ascertain that the data is normally distributed.

A normal distribution has a skewness and kurtosis of zero. Looking at the skewness of mtbf in Example 5.5 it is clear that the mtbf figures have a lot of positive skew and are therefore not drawn from a normally distributed variable. Positive skew can often be compensated for by applying a logarithmic transformation. This is done with the `COMPUTE` command in the line

```compute mtbf_ln = ln (mtbf).
```

Rather than redefining the existing variable, this use of `COMPUTE` defines a new variable mtbf_ln which is the natural logarithm of mtbf. The final command in this example calls `EXAMINE` on this new variable, and it can be seen from the results that both the skewness and kurtosis for mtbf_ln are very close to zero. This provides some confidence that the mtbf_ln variable is normally distributed and thus safe for linear analysis. In the event that no suitable transformation can be found, then it would be worth considering an appropriate non-parametric test instead of a linear one. See NPAR TESTS, for information about non-parametric tests.

 ```PSPP> get file='/usr/local/share/pspp/examples/repairs.sav'. PSPP> examine mtbf /statistics=descriptives. PSPP> compute mtbf_ln = ln (mtbf). PSPP> examine mtbf_ln /statistics=descriptives. ``` Output: ``` Case Processing Summary +-----------------------------------+-------------------------------+ | | Cases | | +----------+---------+----------+ | | Valid | Missing | Total | | | N|Percent|N|Percent| N|Percent| +-----------------------------------+--+-------+-+-------+--+-------+ |Mean time between failures (months)|15| 100.0%|0| .0%|15| 100.0%| +-----------------------------------+--+-------+-+-------+--+-------+ Descriptives +----------------------------------------------------------+---------+--------+ | | | Std. | | |Statistic| Error | +----------------------------------------------------------+---------+--------+ |Mean time between Mean | 8.32| 1.62| |failures (months) 95% Confidence Interval Lower | 4.85| | | for Mean Bound | | | | Upper | 11.79| | | Bound | | | | 5% Trimmed Mean | 7.69| | | Median | 8.12| | | Variance | 39.21| | | Std. Deviation | 6.26| | | Minimum | 1.63| | | Maximum | 26.47| | | Range | 24.84| | | Interquartile Range | 5.83| | | Skewness | 1.85| .58| | Kurtosis | 4.49| 1.12| +----------------------------------------------------------+---------+--------+ Case Processing Summary +-------+-------------------------------+ | | Cases | | +----------+---------+----------+ | | Valid | Missing | Total | | | N|Percent|N|Percent| N|Percent| +-------+--+-------+-+-------+--+-------+ |mtbf_ln|15| 100.0%|0| .0%|15| 100.0%| +-------+--+-------+-+-------+--+-------+ Descriptives +----------------------------------------------------+---------+----------+ | |Statistic|Std. Error| +----------------------------------------------------+---------+----------+ |mtbf_ln Mean | 1.88| .19| | 95% Confidence Interval for Mean Lower Bound| 1.47| | | Upper Bound| 2.29| | | 5% Trimmed Mean | 1.88| | | Median | 2.09| | | Variance | .54| | | Std. Deviation | .74| | | Minimum | .49| | | Maximum | 3.28| | | Range | 2.79| | | Interquartile Range | .92| | | Skewness | -.16| .58| | Kurtosis | -.09| 1.12| +----------------------------------------------------+---------+----------+ ```

Example 5.5: Testing for normality using the `EXAMINE` command and applying a logarithmic transformation. The mtbf variable has a large positive skew and is therefore unsuitable for linear statistical analysis. However the transformed variable (mtbf_ln) is close to normal and would appear to be more suitable.

Previous: , Up: Data Screening and Transformation   [Contents][Index]