Previous: Testing for differences of means, Up: Hypothesis Testing


5.3.2 Linear Regression

Linear regression is a technique used to investigate if and how a variable is linearly related to others. If a variable is found to be linearly related, then this can be used to predict future values of that variable.

In example regression, the service department of the company wanted to be able to predict the time to repair equipment, in order to improve the accuracy of their quotations. It was suggested that the time to repair might be related to the time between failures and the duty cycle of the equipment. The p-value of 0.1 was chosen for this investigation. In order to investigate this hypothesis, the REGRESSION command was used. This command not only tests if the variables are related, but also identifies the potential linear relationship. See REGRESSION.

     PSPP> get file='/usr/local/share/pspp/examples/repairs.sav'.
     PSPP> regression /variables = mtbf duty_cycle /dependent = mttr.
     PSPP> regression /variables = mtbf /dependent = mttr.

Output:

     1.3(1) REGRESSION.  Coefficients
     #=============================================#====#==========#====#=====#
     #                                             #  B |Std. Error|Beta|  t  #
     #========#====================================#====#==========#====#=====#
     #        |(Constant)                          #9.81|      1.50| .00| 6.54#
     #        |Mean time between failures (months) #3.10|       .10| .99|32.43#
     #        |Ratio of working to non-working time#1.09|      1.78| .02|  .61#
     #        |                                    #    |          |    |     #
     #========#====================================#====#==========#====#=====#
     
     1.3(2) REGRESSION.  Coefficients
     #=============================================#============#
     #                                             #Significance#
     #========#====================================#============#
     #        |(Constant)                          #         .10#
     #        |Mean time between failures (months) #         .00#
     #        |Ratio of working to non-working time#         .55#
     #        |                                    #            #
     #========#====================================#============#
     2.3(1) REGRESSION.  Coefficients
     #============================================#=====#==========#====#=====#
     #                                            #  B  |Std. Error|Beta|  t  #
     #========#===================================#=====#==========#====#=====#
     #        |(Constant)                         #10.50|       .96| .00|10.96#
     #        |Mean time between failures (months)# 3.11|       .09| .99|33.39#
     #        |                                   #     |          |    |     #
     #========#===================================#=====#==========#====#=====#
     
     2.3(2) REGRESSION.  Coefficients
     #============================================#============#
     #                                            #Significance#
     #========#===================================#============#
     #        |(Constant)                         #         .06#
     #        |Mean time between failures (months)#         .00#
     #        |                                   #            #
     #========#===================================#============#

Example 5.7: Linear regression analysis to find a predictor for mttr. The first attempt, including duty_cycle, produces some unacceptable high significance values. However the second attempt, which excludes duty_cycle, produces significance values no higher than 0.06. This suggests that mtbf alone may be a suitable predictor for mttr.

The coefficients in the first table suggest that the formula mttr = 9.81 + 3.1 \times mtbf + 1.09 \times duty_cycle can be used to predict the time to repair. However, the significance value for the duty_cycle coefficient is very high, which would make this an unsafe predictor. For this reason, the test was repeated, but omitting the duty_cycle variable. This time, the significance of all coefficients no higher than 0.06, suggesting that at the 0.06 level, the formula mttr = 10.5 + 3.11 \times mtbf is a reliable predictor of the time to repair.