Residuals have long been used for the graphical and numerical examinations of the adequacy of regression models. Conventional residual analysis based on the plots of raw residuals or their smoothed curves is highly subjective, while most numerical goodness-of-fit tests provide little information about the nature of model misspecification. In this work, we develop objective and informative model checking techniques by taking the cumulative sums of residuals over certain coordinates (e.g., covariates or fitted values) or by considering some related aggregates of residuals, such as moving sums and moving averages. For a variety of statistical models and data structures, including generalized linear models with independent or dependent observations and proportional hazards models with censored failure times, the distributions of these stochastic processes under the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be easily generated by computer simulation. Each observed process can then be compared, both graphically and numerically, with a number of realizations from the Gaussian process. Such comparisons enable one to assess objectively whether a trend seen in a residual plot reflects model misspecification or natural variation. The proposed techniques are particularly useful in checking the functional form of a covariate, the link function and the proportional hazards assumption. Illustrations with several medical studies are provided.NOTE: This talk is based on joint work with LJ Wei (Harvard) and Zhiliang Ying (Columbia)
Return to Biostatistics Working Group