Multiple imputation was first introduced by Rubin (1987) in the context of nonresponse in sample surveys. This method solves an incomplete-data problem by repeatedly solving the complete-data version. In multiple imputation, the unknown missing data Y_mis are replaced by simulated values Y_mis,(1),Y_mis,(2),..., Y_mis,(m). Each of the m completed datasets is analyzed by standard complete-data methods. The variability among the results of the m analyses provides a measure of the uncertainty due to missing data, which, when combined with measures of ordinary sample variation, lead to a single inferential statement about the parameters of interest.In this presentation, we will consider two different imputation strategies. In the first, the imputed values are drawn at random from the conditional distribution
P(Y_mis | Y_obs,ß), where ß is some initial estimator held fixed when the draws are made. Rubin refers to this as improper imputation. In the second strategy, the so-called Bayesian proper imputation, the imputed values are drawn from P(Y_mis | Y_obs), the posterior predictive distribution of the missing data under some complete-data model and prior. We will study the large sample properties of these two strategies. In particular, we will consider issues of asymptotic normality, finding consistent estimates for the asymptotic variance, and efficiency of the resulting estimator.
Familiarity with techniques for expanding M-estimators in order to show their large-sample properties will be assumed.