A haplotype is a specific sequence of genetic variation that lies on the same chromosome. Standard genotyping techniques, however, do not distinguish which chromosome a particular variant originated from but instead measure the combination of variants across the pair of chromosomes at a number of positions. Such data are referred to as unphased multilocus genotype data. When analyzing unphased multilocus genotype data from a case-parent study, a standard likelihood approach to inferring the effect of haplotypes on disease risk will require the correct specification of a restrictive model for the distribution of parental haplotypes. If the model for parental haplotypes is misspecified, such an approach can lead to substantial bias in parameter estimates even if complete genotype data is available. In this talk, we present a new method that addresses this problem. Using a geometric approach to estimation in the presence of nuisance parameters, we develop locally-efficient estimators of the effect of haplotypes on disease that are robust to incorrect estimates of haplotype frequencies. We show how to extend these results to inference on the effect of interactions between haplotypes and environmental covariates, as well as dealing with missing genotype data. We compare our results with the popular family based association tests (FBAT) approach, and discuss when marginal tests, such as are available in FBAT, can be misleading. Computational issues and simplifications will also be discussed.
Return to Biostatistics Working Group