Statistical issues on mass spectrometry-based protein identification and quantitation

Liu, Shangbin

Statistical issues on mass spectrometry-based protein identification and quantitation

Liu, Shangbin

2008

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

The main goal of analytical proteomics is the complete and quantitative proteome analysis of species, cells, and/or tissues. Although the great success has been achieved via incremental improvements in Mass Spectrometry (MS)-based proteomics, some principal limitations make the goal of rapid, complete and quantitative proteome analysis not yet achieved. Besides further improvement in MS related machinery and technique, statistical considerations could be one of the aspects to narrow this gap, by choosing the number of replicates and analyzing the variations of factors in LC-MS/MS process. First, I propose a probability-based model that provides the probabilities of achieving a fixed coverage of sample proteins as a function of the number of replicates. With a fixed confidence level, the developed model can determine the coverage of sample proteins as a function of number of replicates. Typically, four to forty replicates are required to have a high confidence of identifying intermediate and high abundance proteins. More than 50 replicates will often be required to reliably identify low abundance proteins. Secondly, in order to analyze effects of various factors on the detection probability in LC-MS/MS process, a mathematical model was derived based on order statistics from independent non-identical normal random variables. As an approximation to the mathematical model, a simulation approach was applied to analyze the impacts of the following factors, protein abundance, complexity of samples, proteolytic digestion efficiency, peptide separation and co-eluting peptides, scanning speed of the mass spectrometer, and dynamic exclusion efficiency, on the peptide/protein identification. The proposed simulation approach could be used as a framework for analysis of impacts of various factors on the peptide/protein detection. The simulation results provide valuable information for optimizing LC-MS/MS techniques and practical guidelines for conducting MS-based experiments. Thirdly, a methodology was developed to conduct statistical test of differential expression of proteins detected in two different samples. By combining the test results from the spectral counts and protein occurrence based methods on the basis of multiple runs of MS data, significantly differentially expressed proteins with high confidence in two different treatments can be obtained.

Record Created

2024-12-05

Title

Statistical issues on mass spectrometry-based protein identification and quantitation

Author

Liu, Shangbin

Contributor

Schliekelman, Paul Advisor
Cieszewski, Chris Committee Member
Li, Yehua Committee Member
Reeves, Jaxk Committee Member
Seymour, Lynne Committee Member

College or School

Franklin College of Arts and Sciences

Department

Statistics

Date

2008

Publisher

University of Georgia

Content Type

Dissertation

Language

English

Dissertation/ Thesis Note

Doctoral

Degree Type

Doctor of Philosophy (PHD)

Name of Granting Institution

University of Georgia, Winter 2008

Year Degree Granted

2008

Keywords

statistical application; probability model; mass spectrometry (MS); MS-based proteomics; replicate; simulation; protein identification; protein quantitation; protein abundance; protein differential expression; proteotypic peptide; retention time

Record Appears in

College, School, or Unit > Franklin College of Arts and Sciences > Statistics
Electronic Theses and Dissertations > Doctoral Dissertation
All Resources
Doctoral

System Control Number

9949333454802959

Download Full History

Statistical issues on mass spectrometry-based protein identification and quantitation

Files

Abstract

Details

PDF

Statistics