FINAL EXAM ST 445 30 April 2008 ONE PAGE (ONE SIDE) OF NOTES NAME ____________________________ For most of the questions on this quiz, I am asking what the output will be from the SAS code. *** For each dataset created, be sure to indicate the number of variables and the number of observations. *** Note that the line numbers are given with the code, and remember that there's a blank column between the line numbers' field and any code or data. 1. a) How many observations, variables? 00001 data a ; 00002 input name $ midterm final ; 00003 score = midterm + final ; 00004 if( final eq . ) then put 'missing ' name ; 00005 label score='midterm and final' ; 00006 keep name score ; 00007 cards ; 00008 Matthew 41 55 pre-med 00009 Mark 44 . 00010 John 46 50.2 lead g 00011 Paul 34 drop lead s 00012 George 45 60 star student 00013 ; 00014 proc print data=a label ; 00015 run ; b) What would be the result of the PROC PRINT? c) What would the "put" statement produce and where would you see its results? d) What would change if line 00003 were replaced by 00003 score = sum(midterm,final) ; 2. a) How many observations, variables? 00001 data b ; 00002 input name $ h1 h2 h3 h4 h5 ; 00003 hsum = sum( of h1-h5 ) ; 00004 diligent = ( nmiss( of h1-h5 ) > 0 ) ; 00005 label hsum='homework' ; 00006 keep name hsum diligent ; 00007 cards ; 00008 Matthew 10 10 10 9 10 00009 George 10 10 10 10 10 00010 John 8 7 . . . 00011 Paul 10 10 . . . 00012 Mark 10 10 9 10 10 00013 ; 00014 proc print data=b label ; 00015 run ; b) What would be the result of the PROC PRINT? 3. Now write the code to put the datasets a and b together, and compute the total score for the course as total = midterm + final + 2*(homework sum) 4. Write the code to read the data at the bottom and create a dataset named 'gasbill' that has 12 observations with the values YEAR QTR GBILL 1985 1 344 1985 2 246 1985 3 131 1985 4 289 1986 1 313 ... ... ... 1987 3 253 1987 4 426 data gasbill ; datalines ; 1985 344 246 131 289 1986 313 189 179 221 1987 267 230 253 426 ; run; 5. a) How many observations, variables? 00001 data new ; 00002 retain count 0 ; 00003 input name $ 3-8 id 9-12 bd $ 13-20 ; 00004 when = input(bd, mmddyy8.) ; * use format to translate ; 00005 if( when < 0 ) then do ; 00006 count = count + 1 ; 00007 output ; 00008 end ; 00009 if( count eq 2 ) then stop ; 00010 datalines ; 00011 Jolene217908211955 00012 Margie228407171956 00013 Evelyn188303301970 00014 Carole455102211952 00015 Angela838311221962 00016 12345678901234567890 00017 run ; 00018 proc print data=new ; 00019 var name id count ; 00020 run ; b) What is the result of the PROC PRINT? 6. a) How many observations, variables? 00001 data wolves ; 00002 infile 'druids.dat' firstobs=2 missover ; 00003 /* missover means missing values given to */ 00004 /* remaining variables when input list is */ 00005 /* longer than data record */ 00006 array p(10) ; * declare array ; 00007 keep year npups ; 00008 input year alphmale alphfem p1-p10 ; 00009 npups = 10 - nmiss( of p1-p10) ; * how many? ; 00010 run ; 00011 proc print data=wolves ; 00012 title 'Druid Peak pack pups' ; 00013 run ; 00014 proc means data=wolves sum max ; * get stats ; 00015 class year ; 00016 var npups ; 00017 title2 'number of pups and largest litter' ; 00018 run ; The file 'druids.dat' looks like year amal afem pups 1997 10 9 40 41 42 43 44 1998 21 9 65 66 1999 21 40 106 107 2000 21 40 133 2000 21 41 134 2001 21 40 155 156 157 158 159 2001 21 41 160 161 162 2001 21 42 163 163 163 2002 21 42 191 192 b) What is the output from the PROC PRINT? c) What is the output from the PROC MEANS? d) Write a DROP statement that has the same effect as 00007 keep year npups ; 8. Draw the plot that SAS would make for the following data and code. (Don't worry about matching the color.) /* do taller people have bigger feet? */ data reg ; input shoesz height gender ; label shoesz = 'shoe size' ; cards ; 9.0 65 1 8.5 70 1 10.5 71 1 6.5 61 2 7.0 63 2 6.0 66 2 8.0 67 2 ; run ; proc format ; value gend 1='male' 2='female' ; run ; proc gplot data=reg ; plot shoesz*height=gender ; title 'Regression plot with two groups' ; symbol1 c=black i=join v=diamond ; symbol2 c=black i=none v=star ; format gender gend. ; run ; 9. Now what about the following code? Draw what it would produce. proc sort data=reg ; by gender ; run ; proc gplot data=reg ; plot shoesz*height ; title 'Regression plot with two groups' ; symbol1 c=black i=join v='x' ; by gender ; format gender gend. ; run ;