Previous: Testing data consistency, Up: Data Screening and Transformation


5.2.5 Testing for normality

Many statistical tests rely upon certain properties of the data. One common property, upon which many linear tests depend, is that of normality — the data must have been drawn from a normal distribution. It is necessary then to ensure normality before deciding upon the test procedure to use. One way to do this uses the EXAMINE command.

In normality, a researcher was examining the failure rates of equipment produced by an engineering company. The file repairs.sav contains the mean time between failures (mtbf) of some items of equipment subject to the study. Before performing linear analysis on the data, the researcher wanted to ascertain that the data is normally distributed.

A normal distribution has a skewness and kurtosis of zero. Looking at the skewness of mtbf in normality it is clear that the mtbf figures have a lot of positive skew and are therefore not drawn from a normally distributed variable. Positive skew can often be compensated for by applying a logarithmic transformation. This is done with the COMPUTE command in the line

     compute mtbf_ln = ln (mtbf).

Rather than redefining the existing variable, this use of COMPUTE defines a new variable mtbf_ln which is the natural logarithm of mtbf. The final command in this example calls EXAMINE on this new variable, and it can be seen from the results that both the skewness and kurtosis for mtbf_ln are very close to zero. This provides some confidence that the mtbf_ln variable is normally distributed and thus safe for linear analysis. In the event that no suitable transformation can be found, then it would be worth considering an appropriate non-parametric test instead of a linear one. See NPAR TESTS, for information about non-parametric tests.

     PSPP> get file='/usr/local/share/pspp/examples/repairs.sav'.
     PSPP> examine mtbf
                     /statistics=descriptives.
     PSPP> compute mtbf_ln = ln (mtbf).
     PSPP> examine mtbf_ln
                     /statistics=descriptives.

Output:

     1.2 EXAMINE.  Descriptives
     #====================================================#=========#==========#
     #                                                    #Statistic|Std. Error#
     #====================================================#=========#==========#
     #mtbf    Mean                                        #   8.32  |   1.62   #
     #        95% Confidence Interval for Mean Lower Bound#   4.85  |          #
     #                                         Upper Bound#  11.79  |          #
     #        5% Trimmed Mean                             #   7.69  |          #
     #        Median                                      #   8.12  |          #
     #        Variance                                    #  39.21  |          #
     #        Std. Deviation                              #   6.26  |          #
     #        Minimum                                     #   1.63  |          #
     #        Maximum                                     #  26.47  |          #
     #        Range                                       #  24.84  |          #
     #        Interquartile Range                         #   5.83  |          #
     #        Skewness                                    #   1.85  |    .58   #
     #        Kurtosis                                    #   4.49  |   1.12   #
     #====================================================#=========#==========#
     
     2.2 EXAMINE.  Descriptives
     #====================================================#=========#==========#
     #                                                    #Statistic|Std. Error#
     #====================================================#=========#==========#
     #mtbf_ln Mean                                        #   1.88  |    .19   #
     #        95% Confidence Interval for Mean Lower Bound#   1.47  |          #
     #                                         Upper Bound#   2.29  |          #
     #        5% Trimmed Mean                             #   1.88  |          #
     #        Median                                      #   2.09  |          #
     #        Variance                                    #   .54   |          #
     #        Std. Deviation                              #   .74   |          #
     #        Minimum                                     #   .49   |          #
     #        Maximum                                     #   3.28  |          #
     #        Range                                       #   2.79  |          #
     #        Interquartile Range                         #   .92   |          #
     #        Skewness                                    #   -.16  |    .58   #
     #        Kurtosis                                    #   -.09  |   1.12   #
     #====================================================#=========#==========#

Example 5.5: Testing for normality using the EXAMINE command and applying a logarithmic transformation. The mtbf variable has a large positive skew and is therefore unsuitable for linear statistical analysis. However the transformed variable (mtbf_ln) is close to normal and would appear to be more suitable.