Homework I. Work these problems from the G&G book: Read section 5.4.7 to remind yourself about the idea of subsampling then work Exercise 5.4 (page 100) [just do the comparison for the CO2 sum of squares] (by "algebraic form" they mean the hand formulas for sums of squares). Exercise 5.5 (page 100) Exercise 5.6 (page 105) Hint: exercise 5.5, you may recall from St 512 that you use the "method of moments" here, that is, you set the mean squares (the numbers) for Block, CO2, and error equal to their expected values (symbols) and solve the resulting equations. II. Cut and paste this program into SAS. data GG_pg97; input block @; Paddy+1; do sample = 1 to 3; CO2="C"; input Y @; output; end; Paddy+1; do sample = 4 to 6; CO2="T"; input Y @; output; end; cards; 1 6.184 6.225 6.220 6.406 6.422 6.406 2 6.302 6.218 6.241 6.352 6.456 6.451 3 6.077 6.144 6.065 6.198 6.246 6.323 4 6.086 6.201 6.132 6.422 6.206 6.287 ; proc print; run; Now run PROC GLM to get the analysis of table 5.6 (there may be a slight numerical discrepancy). Add the statement RANDOM BLOCK BLOCK*CO2; and verify that it adds the expected mean squares from table 5.6 to your output. Why should you put PADDY or its equivalent in the model? Does CO2*BLOCK in your model have the same effect as PADDY? (Note: If you tell me what CO2 is and what block you are in, that tells me what paddy you are in). Now add the test options as follows in GLM: RANDOM BLOCK BLOCK*CO2/TEST; Did the ANOVA F tests change in the first part of the printout? Look at the additional tests, especially the one for CO2, the treatment. Is it different than the one in the ANOVA table? [Note: It is unexpected that there is more variation among samples from the same paddy than there is from paddy to paddy. This causes the "extra" F test to reject more strongly than the incorrect one in the original ANOVA table. Usually it goes the other way so this example is atypical, though it does happen every so often.] Now run the model in PROC MIXED which is just like GLM except you only put fixed effects in the MODEL statement and you put the random effects in the RANDOM statement. Page 61 of our class notes shows an example. Are the variance component estimates the same as those you solved for in exercise 5.5? If not, did one of them come out to 0? (if so, this can affect some of the other estimates). [Note: Not only that, this will also cause the F test here to not match the corrected "extra" test added to the GLM output by the RANDOM statement. In a typical example, the corrected F in GLM and the MIXED default F test will be the same in a relatively simple design like this.] III Work Exercise 5.1 from the review exercises page 112, treating the Friedman-Kendall part as optional (not graded). For exercise 5.1, also list the estimated block and error variance components. Explain why the variance of a single Y is the sum of these two variances. (Hint: think about your model) Using this idea, compute the standard error of a single machine mean, noting that day and error are both random and contribute to the variance. Using PROC MIXED with DAY random, compute the machine 1 LSMEAN and its standard error. Check this against your previous hand calculation. PROC MIXED; CLASS MACHINE DAY; MODEL Y=MACHINE; RANDOM DAY; LSMEANS MACHINE; ==================optional (not graded) ======================= For the exercise 5.1 data you could string the numbers out into a column of 27 Y values with a machine and day column then PROC PLOT; PLOT Y*MACHINE=DAY; On a color machine, PROC GPLOT; PLOT Y*MACHINE=DAY; SYMBOL1 V=DOT I=NONE; gives a nice graph. With no color specified, GPLOT will run through your device's color list. You could try PROC SORT; BY DAY; PROC UNIVARIATE NORMAL PLOT; BY DAY; This gives a test of H0:normality, and a side-by-side box and whisker plot known as a "schematic diagram" The option V=BOXT can also be used in the GPLOT SYMBOL statement to give a schematic diagram.