Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 51) Show all publications
Zhang, B., Pirmoradian, M., Zubarev, R. & Käll, L. (2017). Covariation of Peptide Abundances Accurately Reflects Protein Concentration Differences. Molecular & Cellular Proteomics, 16(5), 936-948
Open this publication in new window or tab >>Covariation of Peptide Abundances Accurately Reflects Protein Concentration Differences
2017 (English)In: Molecular & Cellular Proteomics, ISSN 1535-9476, E-ISSN 1535-9484, Vol. 16, no 5, p. 936-948Article in journal (Refereed) Published
Abstract [en]

Most implementations of mass spectrometry-based proteomics involve enzymatic digestion of proteins, expanding the analysis to multiple proteolytic peptides for each protein. Currently, there is no consensus of how to summarize peptides' abundances to protein concentrations, and such efforts are complicated by the fact that error control normally is applied to the identification process, and do not directly control errors linking peptide abundance measures to protein concentration. Peptides resulting from suboptimal digestion or being partially modified are not representative of the protein concentration. Without a mechanism to remove such unrepresentative peptides, their abundance adversely impacts the estimation of their protein's concentration. Here, we present a relative quantification approach, Diffacto, that applies factor analysis to extract the covariation of peptides' abundances. The method enables a weighted geometrical average summarization and automatic elimination of incoherent peptides. We demonstrate, based on a set of controlled label-free experiments using standard mixtures of proteins, that the covariation structure extracted by the factor analysis accurately reflects protein concentrations. In the 1% peptide-spectrum match-level FDR data set, as many as 11% of the peptides have abundance differences incoherent with the other peptides attributed to the same protein. If not controlled, such contradicting peptide abundance have a severe impact on protein quantifications. When adding the quantities of each protein's three most abundant peptides, we note as many as 14% of the proteins being estimated as having a negative correlation with their actual concentration differences between samples. Diffacto reduced the amount of such obviously incorrectly quantified proteins to 1.6%. Furthermore, by analyzing clinical data sets from two breast cancer studies, our method revealed the persistent proteomic signatures linked to three subtypes of breast cancer. We conclude that Diffacto can facilitate the interpretation and enhance the utility of most types of proteomics data.

Place, publisher, year, edition, pages
American Society for Biochemistry and Molecular Biology, 2017
National Category
Biochemistry and Molecular Biology Bioinformatics (Computational Biology) Biophysics
Identifiers
urn:nbn:se:kth:diva-207901 (URN)10.1074/mcp.O117.067728 (DOI)000400759600017 ()28302922 (PubMedID)2-s2.0-85018359335 (Scopus ID)
Note

QC 20170530

Available from: 2017-05-30 Created: 2017-05-30 Last updated: 2018-01-13Bibliographically approved
Afkham, H. M., Qiu, X., The, M. & Käll, L. (2017). Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics. Bioinformatics, 33(4), 508-513
Open this publication in new window or tab >>Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics
2017 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 4, p. 508-513Article in journal (Refereed) Published
Abstract [en]

Motivation: Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time. Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor ELUDE. Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction. Results: In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies.

Place, publisher, year, edition, pages
OXFORD UNIV PRESS, 2017
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-205074 (URN)10.1093/bioinformatics/btw619 (DOI)000397264100006 ()
Note

QC 20170626

Available from: 2017-06-26 Created: 2017-06-26 Last updated: 2018-01-13Bibliographically approved
Zhang, B., Käll, L. & Zubarev, R. A. (2016). DeMix-Q: Quantification-Centered Data Processing Workflow. Molecular & cellular proteomics (online), 15(4), 1467-1478
Open this publication in new window or tab >>DeMix-Q: Quantification-Centered Data Processing Workflow
2016 (English)In: Molecular & cellular proteomics (online), ISSN 1535-9476, E-ISSN 1535-9484, Vol. 15, no 4, p. 1467-1478Article in journal (Refereed) Published
Abstract [en]

For historical reasons, most proteomics workflows focus on MS/MS identification but consider quantification as the end point of a comparative study. The stochastic data-dependent MS/MS acquisition (DDA) gives low reproducibility of peptide identifications from one run to another, which inevitably results in problems with missing values when quantifying the same peptide across a series of label-free experiments. However, the signal from the molecular ion is almost always present among the MS1 spectra. Contrary to what is frequently claimed, missing values do not have to be an intrinsic problem of DDA approaches that perform quantification at the MS1 level. The challenge is to perform sound peptide identity propagation across multiple high-resolution LC-MS/MS experiments, from runs with MS/MS-based identifications to runs where such information is absent. Here, we present a new analytical workflow DeMix-Q (https://github.com/userbz/DeMix-Q), which performs such propagation that recovers missing values reliably by using a novel scoring scheme for quality control. Compared with traditional workflows for DDA as well as previous DIA studies, DeMix-Q achieves deeper proteome coverage, fewer missing values, and lower quantification variance on a benchmark dataset. This quantification-centered workflow also enables flexible and robust proteome characterization based on covariation of peptide abundances.

Place, publisher, year, edition, pages
American Society for Biochemistry and Molecular Biology, 2016
National Category
Biochemistry and Molecular Biology
Identifiers
urn:nbn:se:kth:diva-186642 (URN)10.1074/mcp.O115.055475 (DOI)000373992600021 ()26729709 (PubMedID)
Note

QC 20160531

Available from: 2016-05-31 Created: 2016-05-13 Last updated: 2017-11-30Bibliographically approved
The, M., MacCoss, M. J., Noble, W. S. & Käll, L. (2016). Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0. Journal of the American Society for Mass Spectrometry, 27(11), 1719-1727
Open this publication in new window or tab >>Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0
2016 (English)In: Journal of the American Society for Mass Spectrometry, ISSN 1044-0305, E-ISSN 1879-1123, Vol. 27, no 11, p. 1719-1727Article in journal (Refereed) Published
Abstract [en]

Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches (PSMs) from such experiments. Percolator’s processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method—grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein—in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/ under an Apache 2.0 license. [Figure not available: see fulltext.]

Place, publisher, year, edition, pages
Springer, 2016
Keyword
Data processing and analysis, Large scale studies, Mass spectrometry - LC-MS/MS, Protein inference, Statistical analysis, Bioinformatics, Data handling, Mass spectrometry, Molecular biology, Peptides, Probability, Statistical methods, Error probabilities, False discovery rate, Large-scale studies, LC-MS/MS, Scalable approach, Shotgun proteomics, Statistical confidence, Proteins
National Category
Biological Sciences
Identifiers
urn:nbn:se:kth:diva-195221 (URN)10.1007/s13361-016-1460-7 (DOI)000385158400002 ()2-s2.0-84991105210 (Scopus ID)
Note

QC 20161117

Available from: 2016-11-17 Created: 2016-11-02 Last updated: 2017-11-29Bibliographically approved
Edfors, F., Danielsson, F., Hallström, B. M., Käll, L., Lundberg, E., Ponten, F., . . . Uhlén, M. (2016). Gene-specific correlation of RNA and protein levels in human cells and tissues. Molecular Systems Biology, 12(10), Article ID 883.
Open this publication in new window or tab >>Gene-specific correlation of RNA and protein levels in human cells and tissues
Show others...
2016 (English)In: Molecular Systems Biology, ISSN 1744-4292, E-ISSN 1744-4292, Vol. 12, no 10, article id 883Article in journal (Refereed) Published
Abstract [en]

An important issue for molecular biology is to establish whether transcript levels of a given gene can be used as proxies for the corresponding protein levels. Here, we have developed a targeted proteomics approach for a set of human non-secreted proteins based on parallel reaction monitoring to measure, at steady-state conditions, absolute protein copy numbers across human tissues and cell lines and compared these levels with the corresponding mRNA levels using transcriptomics. The study shows that the transcript and protein levels do not correlate well unless a gene-specific RNA-to-protein (RTP) conversion factor independent of the tissue type is introduced, thus significantly enhancing the predictability of protein copy numbers from RNA levels. The results show that the RTP ratio varies significantly with a few hundred copies per mRNA molecule for some genes to several hundred thousands of protein copies per mRNA molecule for others. In conclusion, our data suggest that transcriptome analysis can be used as a tool to predict the protein copy numbers per cell, thus forming an attractive link between the field of genomics and proteomics.

Place, publisher, year, edition, pages
Blackwell Publishing, 2016
Keyword
gene expression, protein quantification, targeted proteomics, transcriptomics
National Category
Biochemistry and Molecular Biology
Identifiers
urn:nbn:se:kth:diva-196993 (URN)10.15252/msb.20167144 (DOI)000386948100001 ()2-s2.0-84992562628 (Scopus ID)
Note

QC 20161213

Available from: 2016-12-13 Created: 2016-11-28 Last updated: 2017-11-29Bibliographically approved
The, M., Tasnim, A. & Käll, L. (2016). How to talk about protein-level false discovery rates in shotgun proteomics. Proteomics, 16(18), 2461-2469
Open this publication in new window or tab >>How to talk about protein-level false discovery rates in shotgun proteomics
2016 (English)In: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 16, no 18, p. 2461-2469Article in journal (Refereed) Published
Abstract [en]

A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein-level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level FDRs for both competing null hypotheses.

Place, publisher, year, edition, pages
Wiley-Blackwell, 2016
Keyword
Bioinformatics, Data processing and analysis, Mass spectrometry-LC-MS/MS, Protein inference, Simulation, Statistical analysis
National Category
Biophysics Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:kth:diva-196441 (URN)10.1002/pmic.201500431 (DOI)000385813600005 ()27503675 (PubMedID)2-s2.0-84988369698 (Scopus ID)
Note

QC 20161129

Available from: 2016-11-29 Created: 2016-11-14 Last updated: 2017-11-29Bibliographically approved
The, M. & Käll, L. (2016). MaRaCluster: A Fragment Rarity Metric for Clustering Fragment Spectra in Shotgun Proteomics. Journal of Proteome Research, 15(3), 713-720
Open this publication in new window or tab >>MaRaCluster: A Fragment Rarity Metric for Clustering Fragment Spectra in Shotgun Proteomics
2016 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 15, no 3, p. 713-720Article in journal (Refereed) Published
Abstract [en]

Shotgun proteomics experiments generate large amounts of fragment spectra as primary data, normally with high redundancy between and within experiments. Here, we have devised a clustering technique to identify fragment spectra stemming from the same species of peptide. This is a powerful alternative method to traditional search engines for analyzing spectra, specifically useful for larger scale mass spectrometry studies. As an aid in this process, we propose a distance calculation relying on the rarity of experimental fragment peaks, following the intuition that peaks shared by only a few spectra offer more evidence than peaks shared by a large number of spectra. We used this distance calculation and a complete-linkage scheme to cluster data from a recent large-scale mass spectrometry-based study. The clusterings produced by our method have up to 40% more identified peptides for their consensus spectra compared to those produced by the previous state-of-the-art method. We see that our method would advance the construction of spectral libraries as well as serve as a tool for mining large sets of fragment spectra. The source code and Ubuntu binary packages are available at https://github.com/ statisticalbiotechnology/maracluster (under an Apache 2.0 license).

Place, publisher, year, edition, pages
American Chemical Society (ACS), 2016
Keyword
Mass spectrometry, proteomics, hierarchical clustering bioinformatics, database search, spectral archives, spectral libraries
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-184544 (URN)10.1021/acs.jproteome.5b00749 (DOI)000371754100005 ()26653874 (PubMedID)2-s2.0-84960456163 (Scopus ID)
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscience
Note

QC 20160406

Available from: 2016-04-06 Created: 2016-04-01 Last updated: 2018-01-10Bibliographically approved
Moruz, L. & Käll, L. (2016). Peptide retention time prediction. Mass spectrometry reviews (Print)
Open this publication in new window or tab >>Peptide retention time prediction
2016 (English)In: Mass spectrometry reviews (Print), ISSN 0277-7037, E-ISSN 1098-2787Article in journal (Refereed) Published
Abstract [en]

Most methods for interpreting data from shotgun proteomics experiments are to large degree dependent on being able to predict properties of peptide-ions. Often such predicted properties are limited to molecular mass and fragment spectra, but here we put focus on a perhaps underutilized property, a peptide's chromatographic retention time. We review a couple of different principles of retention time prediction,and their applications within computational proteomics.

Keyword
Bioinformatics, Chromatography, Mass spectrometry, Peptides, Regression analysis, Forecasting, Molecular biology, Proteins, Chromatographic retention time, Computational proteomics, Peptide ions, Retention time prediction, Shotgun proteomics
National Category
Biological Sciences
Identifiers
urn:nbn:se:kth:diva-186776 (URN)10.1002/mas.21488 (DOI)000407931000004 ()2-s2.0-84956866259 (Scopus ID)
Note

QC 20160601

Available from: 2016-06-01 Created: 2016-05-13 Last updated: 2017-09-06Bibliographically approved
Wen, B., Du, C., Li, G., Ghali, F., Jones, A. R., Käll, L., . . . Wang, J. (2015). IPeak: An open source tool to combine results from multiple MS/MS search engines. Proteomics, 15(17), 2916-2920
Open this publication in new window or tab >>IPeak: An open source tool to combine results from multiple MS/MS search engines
Show others...
2015 (English)In: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 15, no 17, p. 2916-2920Article in journal (Refereed) Published
Abstract [en]

Liquid chromatography coupled tandem mass spectrometry (LC-MS/MS) is an important technique for detecting peptides in proteomics studies. Here, we present an open source software tool, termed IPeak, a peptide identification pipeline that is designed to combine the Percolator post-processing algorithm and multi-search strategy to enhance the sensitivity of peptide identifications without compromising accuracy. IPeak provides a graphical user interface (GUI) as well as a command-line interface, which is implemented in JAVA and can work on all three major operating system platforms: Windows, Linux/Unix and OS X. IPeak has been designed to work with the mzIdentML standard from the Proteomics Standards Initiative (PSI) as an input and output, and also been fully integrated into the associated mzidLibrary project, providing access to the overall pipeline, as well as modules for calling Percolator on individual search engine result files. The integration thus enables IPeak (and Percolator) to be used in conjunction with any software packages implementing the mzIdentML data standard. IPeak is freely available and can be downloaded under an Apache 2.0 license at https://code.google.com/p/mzidentml-lib/.

Place, publisher, year, edition, pages
Wiley-VCH Verlagsgesellschaft, 2015
Keyword
Bioinformatics, Mass spectrometry, Peptide identification, Shotgun proteomics
National Category
Biochemistry and Molecular Biology
Identifiers
urn:nbn:se:kth:diva-174239 (URN)10.1002/pmic.201400208 (DOI)000360965900005 ()25951428 (PubMedID)
Note

QC 20151016

Available from: 2015-10-16 Created: 2015-10-02 Last updated: 2017-12-01Bibliographically approved
Boekel, J., Chilton, J. M., Cooke, I. R., Horvatovich, P. L., Jagtap, P. D., Käll, L., . . . Griffin, T. J. (2015). Multi-omic data analysis using Galaxy [Letter to the editor]. Nature Biotechnology, 33(2), 137-9
Open this publication in new window or tab >>Multi-omic data analysis using Galaxy
Show others...
2015 (English)In: Nature Biotechnology, ISSN 1087-0156, E-ISSN 1546-1696, Vol. 33, no 2, p. 137-9Article in journal, Letter (Refereed) Published
National Category
Bioinformatics and Systems Biology
Research subject
Biotechnology
Identifiers
urn:nbn:se:kth:diva-160931 (URN)10.1038/nbt.3134 (DOI)000349198800014 ()25658277 (PubMedID)2-s2.0-84923373613 (Scopus ID)
Funder
Swedish e‐Science Research Center
Note

QC 20150305

Available from: 2015-03-04 Created: 2015-03-04 Last updated: 2017-12-04Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-5689-9797

Search in DiVA

Show all publications