Homework 2 1. Work problem 2.2 from G&G omitting the randomization of the positions in the greenhouse. Briefly describe how you did the randomization of treatments to the cuttings (e.g. show relevant steps of SAS code along with your printout). For problem 2.2, suppose the rooting responses Y for the 5 treatments were: Control: 2 5 2 3 Auxin3 : 20 24 18 22 Auxin1 : 5 7 5 11 Auxin4 : 15 13 10 14 Auxin2 : 22 9 15 18 (a) Compute the analysis of variance table using PROC GLM (see prob. 2 below) (b) Estimate the difference in rooting between the control and lowest (auxin1) level of auxin using an estimate statement. (c) Use an estimate statement to construct a 95% confidence interval for the difference in mean rooting between auxin3 and the average of auxin levels 1 2 and 4. (d) Try the statement ESTIMATE "3 vs rest" TRT 0 -.33 -.33 1 -.33 ; as a possible solution to (c). Why did it fail (hint: what do the coefficients really sum to?) SAS note: You can use ESTIMATE .... /DIVISOR=7, for example, to divide integer coefficents by 7 to machine level accuracy. This may help you in part (c) if you change the 7 to something appropriate. 2. An experiment is done on batches of donuts. Y is the amount (grams) of fat absorbed in the cooking process and the treatment (Fat) is the type of fat used in deep frying the donuts. The design is completely randomized (CRD). Here are the data (source: Snedecor and Cochran): Fat 1 Fat 2 Fat 3 Fat 4 164 178 175 155 172 191 193 166 168 197 178 149 177 182 164 156 185 170 177 168 As you see, there are different numbers of replicates. (a) Compute the 4 means, 4 totals and 4 corrected sums of squares within the fats (1 for each fat). As in our notes, the corrected sum of squares within each fat is its contribution to the error sum of squares. (b) Show how the totals (or means) are used to compute the treatment sum of squares by hand. (recall: the formulas are like those of the balanced case except for the varying replicate numbers used as multipliers (or divisors) ) (c) Estimate the difference in mean fat absorbed with fat 1 and the average absorbances for fats 2 and 3. Do this by hand. Use this code: PROC GLM; CLASS FAT; MODEL Y=FAT/SOLUTION; ESTIMATE "1 vs 2&3" FAT 1 -.5 -.5 0 ; to verify your answer. Also show how your corrected sums of squares in (a) are related to the error sum of squares in the GLM output. (d) Show how the 4 treatment means are computed from the "solutions" produced by PROC GLM. Write down the linear model for this data and explain, in terms of your model parameters, what the Fat 2 coefficient in the GLM output really represents if no arbitrary assumptions are made. (e) In the GLM solutions, there is a number and a t test associated with each level of FAT. By hand, compute the t-test for comparing Fat 1 to Fat 4 (incorporate the pooled MSE from all fats here). How is your t-test related to the solutions printout? (f) Create variables X1 X2 X3 that would be used to do the reference cell coding of our G&G text. Regress Y on X1 X2 X3 using PROC REG (or PROC GLM with no class statement). Explain how the regression coefficients and t tests are related to the PROC GLM output in part (c) where the class statement was used. (g) Produce a new set of columns C1 C2 C3 that would be used in the effects coding. Regress Y on C1 C2 C3, and explain how the 4 coefficients in this regression are related to the 4 fat means from part (a). Compute the mean of all 20 data values and the average of the 4 sample means from part (a). Is either of these the intercept in this regression? ============================================================ OPTIONAL PROBLEMS (not graded): (h) Is the comparison of Fat 2 to Fat 4 (Tau 2 - Tau 4 in linear model terms) an estimable function? If so, show how to calculate this estimate using the (f) and then the (g) solutions above, thus verifying that they give the same result. You can compute the answer to (f) using only the answer to (g) and vice-versa. Try to do so. For problem 1, graph the data and, assuming that the treatments are equally spaced levels of auxin 0 1 2 3 and 4 in some units, fit a quadratic polynomial in auxin level and find the auxin level that gives the highest predicted rooting. This was the experimenter's goal. **** SAS Challenge #1 (optional - not graded) ***** Explain why this program gives the data it does. Note that "put _all_" writes the contents of the PDV to the log window at the time it's executed. You may want to interlace some more puts in the data step. data next; put _all_; input Y; X=X+1; Z=Z+1; retain X 30; drop X; xy = x*y; cards; 10 20 30 ; proc print; run; Obs Y Z xy 1 10 . 310 2 20 . 640 3 30 . 990