Lab 4: Demonstration SAS Program


Sequential and Partial Sum of Squares  

   
Data Weather;
 Title 'Lows and Highs from N&O Jan 28,29,30 1992';
 Title2 'using actual numbers (yesterday values)';
 input city $ hi2 lo2 yhi ylo thi tlo;

*             Mon        Tues       Wed  ;
cards;
seattle      51 44      52 44      59 47
.
.
.
 ;
proc reg; model thi = yhi hi2 tlo ylo lo2/ss1 ss2;
   test tlo=0, ylo=0, lo2=0;

 /*-----------------------------------------------
 |  Showing sequential and partial sums of squares|
 | Note t**2 = F relationship for partial F.  By  |
 | hand, construct F to leave out lows.  Compare  |
 | to test statement.                             |
  -----------------------------------------------*/

Partial Sums of Squares
SS that assesses the value of adding a covariate to a model already containing all other (k-1) covariates (i.e. the effect of putting a particular covariate into the model last). This is referred as either Type II or III SS in SAS.

Sequential Sums of Squares
SS that assesses the value of adding a covariate to a model in a stepwise or sequential manner. These SS sequentially build up the model SS. These SS can depend on the particular order covariates are sequentially placed into the model. (In SAS, the order will depend upon the left-to-right specification of the MODEL statement). This is referred to as Type I SS in SAS.

Corresponding output from the code above:


                        Analysis of Variance

                         Sum of         Mean
Source          DF      Squares       Square      F Value       Prob>F

Model            5   2663.64223    532.72845       32.516       0.0001
Error           13    212.98935     16.38380
C Total         18   2876.63158

    Root MSE       4.04769     R-square       0.9260
    Dep Mean      52.42105     Adj R-sq       0.8975
    C.V.           7.72150

                         Parameter Estimates

                  Parameter      Standard    T for H0:
 Variable  DF      Estimate         Error   Parameter=0    Prob > |T|

 INTERCEP   1     11.643690    4.81973748         2.416        0.0311
 YHI        1      0.583386    0.15311384         3.810        0.0022
 HI2        1      0.123691    0.18436293         0.671        0.5140
 TLO        1      0.310760    0.20102242         1.546        0.1461
 YLO        1     -0.160200    0.19240146        -0.833        0.4201
 LO2        1      0.029428    0.20778837         0.142        0.8895


 Variable  DF     Type I SS    Type II SS

 INTERCEP   1         52211     95.620069
 YHI        1   2572.837368    237.847216
 HI2        1     20.675979      7.374655
 TLO        1     55.808236     39.154006
 YLO        1     13.992024     11.358568
 LO2        1      0.328621      0.328621

Note that if we sum the Type I SS for YHI to LO2 we obtain the model SS from the ANOVA table above. If we wanted to test whether we could omit the "Lo" information from the model, we could use these Type I SS. By hand, construct the appropriate full vs. reduced F statistic using the results of the following code:

proc reg; model thi = yhi hi2;

 /*--------------------------------------------------
 |  Use this to construct full vs. reduced model F   |
 | Compare to previous results.                      |
 | Note decrease in model SS = increase in error SS. |
  --------------------------------------------------*/

As you create the ratio, keep in mind the difference in sum of squares that appears in the numerator. Now, sum the Type I SS for TLO, YLO, and LO2. You should find that these two numbers are the same. Since each Type I SS has 1 df, you can divide this sum by 3 and finally, divide this entire quantity by the MSE from the full model to arrive at the very same F statistic. This shows that the SAS output above provides sufficient information for you to construct the full vs. reduced F test (that is, you don't need to see the results of the ANOVA table from the reduced model).


proc reg; model thi = hi2 tlo ylo lo2 yhi/ss1 ss2;

 /*---------------------------------------------------
 |  How do sequential and partial for yhi here compare |
 | to first regression?  Note that partial SS for any  |
 | variable is sequential SS I WOULD have gotten if    |
 | that variable were fitted last.                     |
  ----------------------------------------------------*/

Seq. and Partial SS from first model.

Seq. and Partial SS from second model.

Var DF    Type I SS  Type II SS

 INT 1         52211   95.620069 
 YHI 1   2572.837368  237.847216 
 HI2 1     20.675979    7.374655 
 TLO 1     55.808236   39.154006 
 YLO 1     13.992024   11.358568 
 LO2 1      0.328621    0.328621 
Var DF   Type I SS  Type II SS

 INT 1        52211   95.620069 
 HI2 1  2067.098437    7.374655 
 TLO 1   262.446657   39.154006 
 YLO 1    29.257813   11.358568 
 LO2 1    66.992104    0.328621 
 YHI 1   237.847216  237.847216 

Since the two models are sequentially built with a different order, it seems to make sense that the Type I SS values are different (there are times, as we'll see, that corresponding Type I SS remain constant regardless of order). Notice however that the same values you see under Type II SS for the first model are found in the second model but in a different order (convince yourself why that should make sense).