Missing data are ubiquitous in almost every area of scientific
inquiry, especially in health sciences research involving human
subjects, and have important implications for data analysis. At the
very least, there is a loss of information and reduction in precision
of inference on the population of interest relative to that
intended. Of greater concern is the potential for biased inference
that can result if the reasons for missingness are related to outcomes
of interest. Accordingly, principled methods to take this challenge
into appropriate account are required. This course will provide an
overview of modern statistical frameworks and methods for analysis in
the presence of missing data. Both methodological developments and
applications will be emphasized.
Course prerequisites
ST 522,
Statistical Theory II, and
ST 552,
Linear Models and Variance Components, or equivalents. Students should also
have been exposed to SAS and R and have reasonable proramming skills.
Please see the instructors if you have questions about the
suitability of your background.
Course topics

Introduction and Motivation:
Challenges posed by missing data, Statistical framework, Missing data
mechanisms (Missing Completely at Random, Missing at Random,
Missing Not At Random), Review of estimating equations
 Naive Methods: Complete
case and available case methods, Simple imputation methods, Last Observation
Carried Forward (LOCF)
 Likelihoodbased Methods Under Missing At Random (MAR):
Review of ML estimation for full data, Factorization of the density, Observed data
likelihood and ignorability, ExpectationMaximization (EM) algorithm, Missing information
principle, Bayesian inference
 Multiple Imputation Methods Under MAR:
Fundamentals, Proper versus improper imputation, Rubin's variance formula Asymptotic
results, Imputation from multivariate normal distribution, Multiple Imputation by Chained Equations
 Inverse Probability Weighted Methods
Under MAR: Weighted generalized estimating equations for
longitudinal data with dropout, Inverse weighting at the occasion and individual
levels, Doubly robust estimation
 Pattern Mixture Models: Rationale, Modeling strategies
 Sensitivity Analysis to Deviations from MAR:
Challenges under Missing Not At Random (MNAR), Estimation of a single mean, Longitudinal data with dropout
See the class notes for more detailed information