FINAL EXAM ST 445 05 May 2011 ONE PAGE (ONE SIDE) OF NOTES NAME ____________________________ For most of the questions on this quiz, I am asking what the output will be from the SAS code. *** For each dataset created, be sure to indicate the number of variables and the number of observations. *** Note that the line numbers are given with the code, and remember that there's a blank column between the line numbers' field and any code or data. 1. a) How many observations, variables? 00001 data a ; 00002 input name $ score ; 00003 if( final eq . ) then put 'missing ' name ; 00004 label score='final' ; 00005 keep name score ; 00006 cards ; 00007 Matthew 55 pre-med 00008 Mark 44 . 00009 John 46 lead g 00010 Paul drop lead s 00011 George 60 star student 00012 Peter 50 drummer 00013 ; 00014 proc print data=a label ; 00015 title 'course results' ; 00016 run ; b) What would be the result of the PROC PRINT? c) What would the "put" statement produce and where would you see its results? 2. a) How many observations, variables? 00001 data b ; 00002 input name $ h1 h2 h3 h4 ; 00003 hsum = sum( of h1-h4 ) ; 00004 diligent = ( nmiss( of h1-h4 ) > 0 ) ; 00005 label hsum='homework' ; 00006 keep name hsum diligent ; 00007 cards ; 00008 Matthew 10 10 10 9 10 00009 George 10 10 10 10 . 00010 John 8 7 . . 00011 Paul 10 10 . . 00012 Mark 10 10 9 10 8 00013 ; 00014 run ; 00015 proc print data=b label ; 00016 title2 'and homework' ; 00017 run ; b) What would be the result of the PROC PRINT? c) Would anything be different if we rewrote 00003 as 00003 hsum = h1+h2+h3+h4 ; 3. Now write the code to put the datasets a and b together, and compute the total score for the course as total = final + 2*(homework sum) 4. a) How many observations, variables? 00001 data new ; 00002 retain count 0 ; 00003 input first ? 3-8 @ ; * hold ; 00004 if( first ne . ) then stop ; * finish cleanly ; 00005 input name $ 3-8 id 9-12 bd $ 13-20 ; 00006 when = input(bd, mmddyy8.) ; * use format to translate ; 00007 if( when < 0 ) then count = count + 1 ; 00008 drop first bd ; 00009 datalines ; 00010 Jolene217908211955 00011 Margie228407171956 00012 Evelyn188303301970 00013 Carole455102211952 00014 Angela838311221962 00015 12345678901234567890 00016 run ; 00017 proc format ; 00018 value newage low-'06Aug45'd = 'Hiroshima' 00019 '07Aug45'd-'22Nov63'd = 'boomer' 00020 '23Nov63'd-high = 'child' ; 00021 run ; 00022 proc print data=new ; 00023 var name count when ; 00024 format when newage. ; 00025 run ; b) What is the result of the PROC PRINT? c) I had originally coded 0004 as given below. Would this coding make any difference in the results? Why or why not? 00004 if( first ne missing ) then stop ; d) What is the question mark for in the following statement? 00003 input first ? 3-8 @ ; * hold ; 5. Write the code to read the dataset 'address.dat' to create a dataset named 'info' with four variables with values as below. name strnum strtype zip ------------------ ------- --------- -------- Mr. Joseph Smith 231 Trail 27607 Ms. Jane Ellen Archer 1401 Street 27612 Dr. Rosa Aguirre 442 Road 25893 Ms. Elaine Marbury 833 Avenue 28564 The file 'address.dat' looks like: 123456789012345678901234567890 Mr. Joseph Smith 231 Dixie Trail Raleigh, NC 27607 Ms. Jane Ellen Archer 1401 Saint Mary's Street Raleigh, NC 27612-5235 Dr. Rosa Aguirre 442 Chapel Hill Road Hillsborough, NC 25893 Ms. Elaine Marbury 833 New Bern Avenue Graham, NC 28564 6. a) How many observations, variables? 00001 data wolves ; 00002 infile 'druids.dat' firstobs=2 missover ; 00003 /* missover means missing values given to */ 00004 /* remaining variables when input list is */ 00005 /* longer than data record */ 00006 array p(10) ; * declare array ; 00007 keep year npups ; 00008 input year alphmale alphfem p1-p10 ; 00009 npups = 10 - nmiss( of p1-p10) ; * how many? ; 00010 run ; 00011 proc print data=wolves ; 00012 title 'Druid Peak pack pups' ; 00013 run ; 00014 proc means data=wolves sum max ; * get stats ; 00015 class year ; 00016 var npups ; 00017 title2 'number of pups and largest litter' ; 00018 run ; The file 'druids.dat' looks like year amal afem pups 1997 10 9 40 41 42 43 44 1998 21 9 65 66 1999 21 40 106 107 2000 21 40 133 2000 21 41 134 2001 21 40 155 156 157 158 159 2001 21 41 160 161 162 2001 21 42 163 163 163 2002 21 42 191 192 b) What is the output from the PROC PRINT? c) What is the output from the PROC MEANS? 7. Recall the question on the second quiz on reading the file 'consump.dat' to produce a dataset with 26 observations. Below is code written to create a dataset that looks like: YEAR QUART USCNSMP ------ ------- --------- 1955 1 248.7 1955 2 253.7 1955 3 259.9 1955 4 261.8 1956 1 263.2 ... ... 1961 1 297.0 1961 2 301.6 where the file consump.dat looks like: 1955 248.7 253.7 259.9 261.8 1956 263.2 263.7 263.4 266.9 1957 268.9 270.4 273.4 272.1 1958 268.9 270.9 274.4 278.7 1959 283.8 289.7 290.8 292.8 1960 295.4 299.5 298.6 299.6 1961 297.0 301.6 . . Correct the following code. (In addition to some comments,) I think it needs at least 3 fixes. data try ; infile 'consump.dat' ; input uscnsmp @ ; quart = mod(_n_,4) ; if( quart eq 1 ) then year = uscnsmp ; if( uscnsmp ne . ) then output ; run ; 8. Draw the plot that SAS would make for the following data and code. (Don't worry about matching the color.) /* do taller people have bigger feet? */ data reg ; input shoesz height gender ; label shoesz = 'shoe size' ; cards ; 9.0 68 1 8.5 70 1 10.5 71 1 6.5 61 2 7.0 63 2 6.0 60 2 8.0 66 2 ; run ; proc gplot data=reg ; plot shoesz*height=gender ; title 'Regression plot with two groups' ; symbol1 c=black i=join v=diamond ; symbol2 c=black i=none v=star ; run ; 9. Now what about the following code? Draw what it would produce. proc gplot data=reg ; plot shoesz*height ; by gender ; title 'Regression plot with two groups' ; symbol1 c=black i=join v=diamond ; run ;