Department of Statistics Seminar
North Carolina State University

presents

David Hitchcock

Department of Statistics

University of South Carolina

Protein Identification Using Bayesian Stochastic Search

Abstract

Current methods for protein identification in tandem mass spectrometry (MS/MS) involve database searches or de novo peptide sequencing, with de novo sequencing being the standard method. With database searches, there is a limitation due to the relatively low number of known proteins. Shortcomings of de novo peptide sequencing include chemical noise, overly complex fragments, and incomplete b- and y-ion sequences. Here we present a Bayesian approach to identifying peptides. Our model uses prior information about the average relative abundances of bond cleavages and the prior probability of any particular amino acid sequence. The proposed likelihood function is driven by two overall distance measures, which measure how close an observed spectrum is to a theoretical scan for a peptide. A Markov chain Monte Carlo (MCMC) algorithm is employed to simulate candidate choices from the posterior distribution of the peptide sequence. The true peptide is estimated to be the peptide with the largest posterior density. In addition, our method is designed to rank top candidate peptides according to their approximate posterior probabilities, which allows one to see the relative uncertainty in the best choice. Our method is not dependent upon known peptides as in the database searches and aims to alleviate some of the drawbacks of de novo sequencing.  

This work in progress is joint with Nicole Lewis, Ian
Dryden, and John Rose.

Friday, March 1, 2013
3:00pm - 4:00pm
2203 SAS Hall

Refreshments will be served in the 5th floor commons at 2:30pm.
NOTE: No food or drink is allowed in any of the classrooms in SAS Hall.