Multiple imputation is a technique in which each missing value is replaced by m > 1 simulated values. Each of the m completed data sets is analyzed by standard complete-data methods. The variability among the results of the m analysis provides a measure of the uncertainty due to missing data, which, when combined with measures of ordinary sample variation, lead to single inferential statement about the parameters of interest.Generating multiple imputations has been problematic except in some simple cases. Using techniques of Markov Chain Monte Carlo (MCMC), however, it is now possible to do this quite easily. MCMC is a body of methods for generating pseudo-random draws from probability distributions via Markov Chain.
Typically the observed-data likelihood functions for missing data problems are complicated and hence except in special cases, meaningful finite-sample summaries of theses functions (e.g. modes) are not available in closed forms, nor are they readily computable from classical numerical methods. A point estimate (say obtained via EM), even if it is efficient, is not especially useful unless there is also some measure of uncertainty associated with it. Traditional methods are often infeasible to provide finite-sample measures of uncertainty of the point estimates. Bayesian method, which enjoys many advantages over more traditional methods, may be the only method currently feasible when the unknown parameter is of high dimension. In the Bayesian framework all inferences are based on a posterior probability distribution for the unobserved quantities (parameters and/or missing data) that conditions on the observed quantities. Thus, using MCMC, two main tasks of data analysis with missing data are considered:
- Multiple imputation - simulating draws from the posterior predictive distribution of unobserved conditional on the observed values, and
- Posterior simulation - drawing from the posterior distribution of the model parameters conditional on the completed data.