How to talk about protein-level false discovery rates in shotgun proteomics
(English)Manuscript (preprint) (Other academic)
A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate. Many researchers consider protein-level false discovery rates a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level false discovery rates, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the false discovery rate. Furthermore, we demonstrate how the same simulations can be used to verify false discovery rate estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level false discovery rates for both competing null hypotheses.
mass spectrometry - LC-MS/MS, statistical analysis, data processing and analysis, protein inference, simulation
Bioinformatics and Systems Biology
Research subject Biotechnology
IdentifiersURN: urn:nbn:se:kth:diva-185116OAI: oai:DiVA.org:kth-185116DiVA: diva2:918719
QC 201604122016-04-112016-04-112016-04-12Bibliographically approved