An Introduction to Imputation Methods for Missing Data

Marie Davidian

Naive approaches to handling missing data problems, such as complete case or observed (available) data analyses have acknowledged drawbacks that have been reviewed in previous meetings. Methods based on writing down the likelihood for the observed data may be difficult to implement; e.g. although the EM algorithm and its variants represent a general approach in this case, in specific instances this may still present computational challenges. An intuitively appealing alternative is to "fill in" missing values in some fashion based on the observed data and then use methods for fitting the full data model, which may be available in standard software, to analyze the "full" data set consisting of observed and "filled in" values. Such imputation methods may be conceived and implemented in a variety of ways, and may or may not lead to reliable inferences. This talk will provide an introduction to simple imputation techniques, note their connection with other analyses in certain problems, and discuss their potential drawbacks.