The accuracy of statistical confidence estimates in shotgun proteomics

Detta är en avhandling från Stockholm : Department of Biochemistry and Biophysics, Stockholm University

Sammanfattning: High-throughput techniques are currently some of the most promising methods to study molecular biology, with the potential to improve medicine and enable new biological applications. In proteomics, the large scale study of proteins, the leading method is mass spectrometry. At present researchers can routinely identify and quantify thousands of proteins in a single experiment with the technique called shotgun proteomics.A challenge of these experiments is the computational analysis and the interpretation of the mass spectra. A shotgun proteomics experiment easily generates tens of thousands of spectra, each thought to represent a peptide from a protein. Due to the immense biological and technical complexity, however, our computational tools often misinterpret these spectra and derive incorrect peptides. As a consequence, the biological interpretation of the experiment relies heavily on the statistical confidence that we estimate for the identifications.In this thesis, I have included four articles from my research on the accuracy of the statistical confidence estimates in shotgun proteomics, how to accomplish and evaluate it. In the first two papers a new method to use pre-characterized protein samples to evaluate this accuracy is presented. The third paper deals with how to avoid statistical inaccuracies when using machine learning techniques to analyze the data. In the fourth paper, we present a new tool for analyzing shotgun proteomics results, and evaluate the accuracy of  its statistical estimates using the method from the first papers.The work I have included here can facilitate the development of new and accurate computational tools in mass spectrometry-based proteomics. Such tools will help making the interpretation of the spectra and the downstream biological conclusions more reliable.