/****************************************************************** CHAPTER 12, EXAMPLE 2 Fit a logistic regression model to the "wheezing" data. These are binary data, thus, we use the Bernoulli (bin) mean/variance assumptions. The model is fitted with different working correlation matrices. ******************************************************************/ options ls=80 ps=59 nodate; run; /****************************************************************** The data look like (first 4 records): 1 portage 9 0 1 10 0 1 11 0 1 12 0 0 2 kingston 9 1 1 10 2 1 11 2 0 12 2 0 3 kingston 9 0 1 10 0 0 11 1 0 12 1 0 4 portage 9 0 0 10 0 1 11 0 1 12 1 0 . . . column 1 child column 2 city columns 3-5 age=9, smoking indiciator, wheezing response columns 6-8 age=10, smoking indiciator, wheezing response columns 9-11 age=11, smoking indiciator, wheezing response columns 12-14 age=12, smoking indiciator, wheezing response Some of the children have missing values for smoking and wheezing, as shown in Chapter 1. There are 32 children all together. See the output for the full data printed out one observation per line. We read in the data using the "@@" symbol so that SAS will continue to read for data on the same line and the OUTPUT statement to write each block of three observations for each age in as a separate data record. The resulting data set is one with a separate line for each observation. City is a character variable, so the dollar sign is used to read it in as such. ******************************************************************/ data wheeze; infile 'wheeze.dat'; input child city$ @@; do i=1 to 4; input age smoke wheeze @@; output; end; run; proc print data=wheeze; run; /***************************************************************** Fit the logistic regression model using PROC GENMOD and three different working correlation matrix assumptions: - unstructured - compound symmetry (exchangeable) - AR(1) We fit a model with linear predictor allowing effects of city and maternal smoking status but no "interaction" terms among these. The DIST=BIN option in the MODEL statement specifies that the Bernoulli mean-variance relationship be assumed. The LINK=LOGIT option asks for the logistic mean model. The REPEATED statement specifies the "working" correlation structure to be assumed. The CORRW option in the REPEATED statement prints out the estimated working correlation matrix under the assumption given in the TYPE= option. The COVB option prints out the estimated covariance matrix of the estimate of beta -- both the usual estimate and the "robust" version are printed. The MODELSE option specifies that the standard error estimates printed for the elements of betahat are based on the usual theory. By default, the ones based on the "robust" version of the sampling covariance matrix are printed, too. The dispersion parameter phi is held fixed at 1 by default. The missing values are coded in the usual SAS way by periods (.). We delete these from the full data set, so that the data set input to PROC GENMOD contains only the observed data. We assume that the fact that these observations are missing has nothing to do with the thing under study (which may or may not be true). Thus, because these data are not balanced, we use the WITHIN option of the REPEATED statement to give SAS the time variable AGE as a classification variable so that it can figure out where the missing values are and use this information in estimating the correlation matrix. In versions 7 and higher of SAS, PROC GENMOD will model by default the probability that the response y=0 rather than the conventional y=1! To make PROC GENMOD model probability y=1, as is standard, one must include the DESCENDING option in the PROC GENMOD statement. In earlier versions of SAS, the probability y=1 is modeled by default, as would be expected. If the user is unsure which probability is being modeled, one can check the .log file. In later versions of SAS, an explicit statement about what is being modeled will appear. PROC GENMOD output should also contain a statement about what is being modeled. ******************************************************************/ data wheeze; set wheeze; if wheeze=. then delete; time=age; run; title "UNSTRUCTURED CORRELATION"; proc genmod data=wheeze descending; class child city smoke time; model wheeze = city smoke / dist=bin link=logit; repeated subject=child / type=un corrw covb modelse within=time; run; title "COMPOUND SYMMETRY (EXCHANGEABLE) CORRELATION"; proc genmod data=wheeze descending; class child city smoke time; model wheeze = city smoke / dist=bin link=logit; repeated subject=child / type=cs corrw covb modelse within=time; run; title "AR(1) CORRELATION"; proc genmod data=wheeze descending; class child city smoke time; model wheeze = city smoke / dist=bin link=logit; repeated subject=child / type=ar(1) corrw covb modelse within=time; run;