Analysis of Genetic Sequencing Association Studies using the Sequence Kernel Association Test

Michael Wu
Department of Biostatistics
University of North Carolina

4:00-5:00 pm
Thursday, November 18, 2010
321 Riddick Hall, NCSU Campus

Although genome wide association studies have identified more than 1000 genetic loci associated with specific diseases and complex traits, common variants typically explain only a fraction of trait heritability. The recent advent of massively parallel sequencing has transformed human genetics and has the potential to explain some of the missing heritability. Although considerable resources have been devoted to sequence mapping and genotype calling, successful application of sequencing to the study of complex traits requires novel statistical methods that efficiently analyze data on rare variants. Here we introduce the sequence kernel association test (SKAT), a direct, flexible, and powerful approach for testing the association between a group of rare variants and a dichotomous (case-control) or quantitative phenotype, while naturally adjusting for additional variables. Our test uses the kernel machine regression framework, previously considered for common variants, which directly regresses the phenotype on the rare variant genotypes and compares the pair-wise genetic similarity between individuals, as determined by similarity in rare alleles, to pair-wise similarity in phenotype. Key features of our test include allowance for complex and epistatic variant effects by varying the similarity metric, harnessing local correlation structure, using flexible weights to boost power, and explicit covariate adjustment to account for confounders such as population stratification. P-values are easily computed using simple formulae so that the test is computationally efficient. We demonstrate through simulation and analysis of re-sequencing studies of triglyceride levels that SKAT often offers advantages over existing rare variant test in terms of power and practical considerations. In addition, we describe approximate analytic tools to estimate statistical power and sample size for association testing using SKAT for both dichotomous and quantitative traits.


Return to Biostatistics Working Group