Department of
presents
Dipak K. Dey
Department of Statistics,
Model Selection and Diagnostics to Identify Genetic Markers for
Single-nucleotide Polymorphisms
Abstract
The
distribution of genetic variation among populations is conveniently measured by
Wright's F{ST}, which is a
scaled variance taking on values in [0,1].
For certain types of genetic markers, and for single-nucleotide
polymorphisms (SNPs), in particular, it is reasonable
to presume that the genotype at most loci detected by those markers are
selectively neutral. For such loci, the
distribution of genetic variation among populations is determined by the size
of local populations, the pattern and rate of migration among those populations,
and the rate of mutation. Because the
demographic parameters (population size and migration rates) are common across
all loci, locus-specific estimates of F{ST}
will depart from a common mean only for loci with unusually high or low rates
of mutation and for loci that are closely associated with genomic regions
having a substantial effect on fitness. Thus,
loci showing significantly more variation than background are likely to mark
genomic regions subject to diversifying selection among the sample populations,
while those showing significantly less variation than background are likely to mark
genomic regions subject to stabilizing selection across the sample
populations. We propose several Bayesian
hierarchical models to estimate locus-specific effects on F{ST}, and we apply these models to single nucleotide
polymorphism data from the HapMap project. Because loci that are physically associated
with one another are likely to show similar patterns of variation, we introduce
conditional autoregressive models to incorporate the local correlation among
loci. We estimate the posterior
distributions of the model parameters using Markov chain Monte Carlo (MCMC) simulations. Model comparison using several criteria,
including DIC and LPML, reveals that a model with locus- and
population-specific effects is superior to other models for the data used in
the analysis. To detect loci for which
locus-specific effects are not well explained by the common F{ST}, we propose an approach
that measures the divergence between the posterior distributions of
locus-specific effects and the common F{ST}
with the Kullback-Leibler divergence measure
(KLD). With this method, we identify 15
SNP loci that have unusually large values of F{ST}. By
comparing the map position of the SNP loci with known gene locations, we find
10 out of the 15 are located either within identified genes or nearby.
Friday, November
2, 2007
3:35 - 4:35 pm
301 Riddick Hall
Refreshments will be served in the common area of 301 Riddick at 3:00 pm.