Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0
KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-5401-5553
KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0001-5689-9797
2016 (English)In: Journal of the American Society for Mass Spectrometry, ISSN 1044-0305, E-ISSN 1879-1123, Vol. 27, no 11, p. 1719-1727Article in journal (Refereed) Published
Abstract [en]

Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches (PSMs) from such experiments. Percolator’s processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method—grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein—in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/ under an Apache 2.0 license. [Figure not available: see fulltext.]

Place, publisher, year, edition, pages
Springer, 2016. Vol. 27, no 11, p. 1719-1727
Keywords [en]
Data processing and analysis, Large scale studies, Mass spectrometry - LC-MS/MS, Protein inference, Statistical analysis, Bioinformatics, Data handling, Mass spectrometry, Molecular biology, Peptides, Probability, Statistical methods, Error probabilities, False discovery rate, Large-scale studies, LC-MS/MS, Scalable approach, Shotgun proteomics, Statistical confidence, Proteins
National Category
Biological Sciences
Identifiers
URN: urn:nbn:se:kth:diva-195221DOI: 10.1007/s13361-016-1460-7ISI: 000385158400002Scopus ID: 2-s2.0-84991105210OAI: oai:DiVA.org:kth-195221DiVA, id: diva2:1047404
Note

QC 20161117

Available from: 2016-11-17 Created: 2016-11-02 Last updated: 2018-10-01Bibliographically approved
In thesis
1. Statistical and machine learning methods to analyze large-scale mass spectrometry data
Open this publication in new window or tab >>Statistical and machine learning methods to analyze large-scale mass spectrometry data
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Modern biology is faced with vast amounts of data that contain valuable information yet to be extracted. Proteomics, the study of proteins, has repositories with thousands of mass spectrometry experiments. These data gold mines could further our knowledge of proteins as the main actors in cell processes and signaling. Here, we explore methods to extract more information from this data using statistical and machine learning methods.

First, we present advances for studies that aggregate hundreds of runs. We introduce MaRaCluster, which clusters mass spectra for large-scale datasets using statistical methods to assess similarity of spectra. It identified up to 40% more peptides than the state-of-the-art method, MS-Cluster. Further, we accommodated large-scale data analysis in Percolator, a popular post-processing tool for mass spectrometry data. This reduced the runtime for a draft human proteome study from a full day to 10 minutes.

Second, we clarify and promote the contentious topic of protein false discovery rates (FDRs). Often, studies report lists of proteins but fail to report protein FDRs. We provide a framework to systematically discuss protein FDRs and take away hesitance. We also added protein FDRs to Percolator, opting for the best-peptide approach which proved superior in a benchmark of scalable protein inference methods.

Third, we tackle the low sensitivity of protein quantification methods. Current methods lack proper control of error sources and propagation. To remedy this, we developed Triqler, which controls the protein quantification FDR through a Bayesian framework. We also introduce MaRaQuant, which proposes a quantification-first approach that applies clustering prior to identification. This reduced the number of spectra to be searched and allowed us to spot unidentified analytes of interest. Combining these tools outperformed the state-of-the-art method, MaxQuant/Perseus, and found enriched functional terms for datasets that had none before.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2018. p. 64
Series
TRITA-CBH-FOU ; 2018:45
Keywords
mass spectrometry - LC-MS/MS, statistical analysis, data processing and analysis, protein inference, large-scale studies, simulation, protein quantification, clustering, machine learning, Bayesian statistics
National Category
Bioinformatics (Computational Biology)
Research subject
Biotechnology
Identifiers
urn:nbn:se:kth:diva-235629 (URN)978-91-7729-967-7 (ISBN)
Public defence
2018-10-24, Atrium, Nobels väg 12B, Solna, 13:00 (English)
Opponent
Supervisors
Note

QC 20181001

Available from: 2018-10-01 Created: 2018-10-01 Last updated: 2018-10-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

The, MatthewKäll, Lukas

Search in DiVA

By author/editor
The, MatthewKäll, Lukas
By organisation
Gene TechnologyScience for Life Laboratory, SciLifeLab
In the same journal
Journal of the American Society for Mass Spectrometry
Biological Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 35 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf