Go to main content

The main goal of analytical proteomics is the complete and quantitative proteome analysis of species, cells, and/or tissues. Although the great success has been achieved via incremental improvements in Mass Spectrometry (MS)-based proteomics, some principal limitations make the goal of rapid, complete and quantitative proteome analysis not yet achieved. Besides further improvement in MS related machinery and technique, statistical considerations could be one of the aspects to narrow this gap, by choosing the number of replicates and analyzing the variations of factors in LC-MS/MS process. First, I propose a probability-based model that provides the probabilities of achieving a fixed coverage of sample proteins as a function of the number of replicates. With a fixed confidence level, the developed model can determine the coverage of sample proteins as a function of number of replicates. Typically, four to forty replicates are required to have a high confidence of identifying intermediate and high abundance proteins. More than 50 replicates will often be required to reliably identify low abundance proteins. Secondly, in order to analyze effects of various factors on the detection probability in LC-MS/MS process, a mathematical model was derived based on order statistics from independent non-identical normal random variables. As an approximation to the mathematical model, a simulation approach was applied to analyze the impacts of the following factors, protein abundance, complexity of samples, proteolytic digestion efficiency, peptide separation and co-eluting peptides, scanning speed of the mass spectrometer, and dynamic exclusion efficiency, on the peptide/protein identification. The proposed simulation approach could be used as a framework for analysis of impacts of various factors on the peptide/protein detection. The simulation results provide valuable information for optimizing LC-MS/MS techniques and practical guidelines for conducting MS-based experiments. Thirdly, a methodology was developed to conduct statistical test of differential expression of proteins detected in two different samples. By combining the test results from the spectral counts and protein occurrence based methods on the basis of multiple runs of MS data, significantly differentially expressed proteins with high confidence in two different treatments can be obtained.

Metric
From
To
Interval
Export
Download Full History