In practical applications, although the intention may be to collect data according to some carefully planned study design, things do not always work out as hoped. For example,In other settings, resources may dictate that not all information of interest can be gathered on all subjects in a study. To reduce costs, the full set of necessary information may be gathered on only a subset of the subjects by design, leaving it missing for the remainder of the subjects.
- a survey questionnaire may be sent to a sample of randomly chosen subjects; however, some subjects may provide only partial answers to some questions, or, worse, may fail to return the questionnaire at all.
- the design of a longitudinal clinical trial to compare two treatments may involve subjects who have been randomized to the treatments to return to the clinic monthly to provide measurements on some variables of interest. However, some subjects may drop out of the study at some point, failing to ever return to the clinic. Others may miss clinic visits in a sporadic fashion.
In all of these cases, some of the data that ideally would have been collected are
missing , either by misfortune or design. In general, inference is usually focused on some aspect of the distribution of the "full" data, i.e. the data that would have been available if there were no missingness. For example, it may be desired to estimate the mean response in the entire population of subjects if exposed to a particular treatment; however, the concern is that, under missingness, the subjects whose data are actually available may not represent a true random sample from this population. Thus, more generally, there is concern that the ability to make accurate inference may be compromised in the presence of missing data.An extensive literature exists on approaches to taking account of missing data in these situations, and the general area is an area of active ongoing research. This lecture will provide a basic introduction to the issues, terminology, and notational conventions critical for appreciating the problems associated with missing data.