MaRaCluster: A Fragment Rarity Metric for Clustering Fragment Spectra in Shotgun Proteomics
2016 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 15, no 3, 713-720 p.Article in journal (Refereed) PublishedText
Shotgun proteomics experiments generate large amounts of fragment spectra as primary data, normally with high redundancy between and within experiments. Here, we have devised a clustering technique to identify fragment spectra stemming from the same species of peptide. This is a powerful alternative method to traditional search engines for analyzing spectra, specifically useful for larger scale mass spectrometry studies. As an aid in this process, we propose a distance calculation relying on the rarity of experimental fragment peaks, following the intuition that peaks shared by only a few spectra offer more evidence than peaks shared by a large number of spectra. We used this distance calculation and a complete-linkage scheme to cluster data from a recent large-scale mass spectrometry-based study. The clusterings produced by our method have up to 40% more identified peptides for their consensus spectra compared to those produced by the previous state-of-the-art method. We see that our method would advance the construction of spectral libraries as well as serve as a tool for mining large sets of fragment spectra. The source code and Ubuntu binary packages are available at https://github.com/ statisticalbiotechnology/maracluster (under an Apache 2.0 license).
Place, publisher, year, edition, pages
American Chemical Society (ACS), 2016. Vol. 15, no 3, 713-720 p.
Mass spectrometry, proteomics, hierarchical clustering bioinformatics, database search, spectral archives, spectral libraries
Bioinformatics (Computational Biology)
IdentifiersURN: urn:nbn:se:kth:diva-184544DOI: 10.1021/acs.jproteome.5b00749ISI: 000371754100005PubMedID: 26653874ScopusID: 2-s2.0-84960456163OAI: oai:DiVA.org:kth-184544DiVA: diva2:917308
FunderScience for Life Laboratory - a national resource center for high-throughput molecular bioscience
QC 201604062016-04-062016-04-012016-04-12Bibliographically approved