Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 62) Show all publications
Chalabi, M. H., Tsiamis, V., Käll, L., Vandin, F. & Schwammle, V. (2019). CoExpresso: assess the quantitative behavior of protein complexes in human cells. BMC Bioinformatics, 20, Article ID 17.
Open this publication in new window or tab >>CoExpresso: assess the quantitative behavior of protein complexes in human cells
Show others...
2019 (English)In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 20, article id 17Article in journal (Refereed) Published
Abstract [en]

BackgroundTranslational and post-translational control mechanisms in the cell result in widely observable differences between measured gene transcription and protein abundances. Herein, protein complexes are among the most tightly controlled entities by selective degradation of their individual proteins. They furthermore act as control hubs that regulate highly important processes in the cell and exhibit a high functional diversity due to their ability to change their composition and their structure. Better understanding and prediction of these functional states demands methods for the characterization of complex composition, behavior, and abundance across multiple cell states. Mass spectrometry provides an unbiased approach to directly determine protein abundances across different cell populations and thus to profile a comprehensive abundance map of proteins.ResultsWe provide a tool to investigate the behavior of protein subunits in known complexes by comparing their abundance profiles across up to 140 cell types available in ProteomicsDB. Thorough assessment of different randomization methods and statistical scoring algorithms allows determining the significance of concurrent profiles within a complex, therefore providing insights into the conservation of their composition across human cell types as well as the identification of intrinsic structures in complex behavior to determine which proteins orchestrate complex function. This analysis can be extended to investigate common profiles within arbitrary protein groups. CoExpresso can be accessed through http://computproteomics.bmb.sdu.dk/Apps/CoExpresso.ConclusionsWith the CoExpresso web service, we offer a potent scoring scheme to assess proteins for their co-regulation and thereby offer insight into their potential for forming functional groups like protein complexes.

Place, publisher, year, edition, pages
BMC, 2019
Keywords
Protein complex, Statistics, Co-regulation
National Category
Biochemistry and Molecular Biology
Identifiers
urn:nbn:se:kth:diva-242241 (URN)10.1186/s12859-018-2573-8 (DOI)000455335900002 ()30626316 (PubMedID)2-s2.0-85059641401 (Scopus ID)
Note

QC 20190129

Available from: 2019-01-29 Created: 2019-01-29 Last updated: 2019-01-29Bibliographically approved
The, M. & Käll, L. (2019). Integrated Identification and Quantification Error Probabilities for Shotgun Proteomics. Molecular & Cellular Proteomics, 18(3), 561-570
Open this publication in new window or tab >>Integrated Identification and Quantification Error Probabilities for Shotgun Proteomics
2019 (English)In: Molecular & Cellular Proteomics, ISSN 1535-9476, E-ISSN 1535-9484, Vol. 18, no 3, p. 561-570Article in journal (Refereed) Published
Abstract [en]

Protein quantification by label-free shotgun proteomics experiments is plagued by a multitude of error sources. Typical pipelines for identifying differential proteins use intermediate filters to control the error rate. However, they often ignore certain error sources and, moreover, regard filtered lists as completely correct in subsequent steps. These two indiscretions can easily lead to a loss of control of the false discovery rate (FDR). We propose a probabilistic graphical model, Triqler, that propagates error information through all steps, employing distributions in favor of point estimates, most notably for missing value imputation. The model outputs posterior probabilities for fold changes between treatment groups, highlighting uncertainty rather than hiding it. We analyzed 3 engineered data sets and achieved FDR control and high sensitivity, even for truly absent proteins. In a bladder cancer clinical data set we discovered 35 proteins at 5% FDR, whereas the original study discovered 1 and MaxQuant/Perseus 4 proteins at this threshold. Compellingly, these 35 proteins showed enrichment for functional annotation terms, whereas the top ranked proteins reported by MaxQuant/Perseus showed no enrichment. The model executes in minutes and is freely available at https://pypi.org/project/triqler/.

Place, publisher, year, edition, pages
AMER SOC BIOCHEMISTRY MOLECULAR BIOLOGY INC, 2019
National Category
Biological Sciences
Identifiers
urn:nbn:se:kth:diva-252661 (URN)10.1074/mcp.RA118.001018 (DOI)000467885100013 ()30482846 (PubMedID)2-s2.0-85062999333 (Scopus ID)
Note

QC 20190610

Available from: 2019-06-10 Created: 2019-06-10 Last updated: 2019-06-10Bibliographically approved
Halloran, J. T., Zhang, H., Kara, K., Renggli, C., The, M., Zhang, C., . . . Noble, W. S. (2019). Speeding Up Percolator. Journal of Proteome Research, 18(9), 3353-3359
Open this publication in new window or tab >>Speeding Up Percolator
Show others...
2019 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 18, no 9, p. 3353-3359Article in journal (Refereed) Published
Abstract [en]

The processing of peptide tandem mass spectrometry data involves matching observed spectra against a sequence database. The ranking and calibration of these peptide-spectrum matches can be improved substantially using a machine learning postprocessor. Here, we describe our efforts to speed up one widely used postprocessor, Percolator. The improved software is dramatically faster than the previous version of Percolator, even when using relatively few processors. We tested the new version of Percolator on a data set containing over 215 million spectra and recorded an overall reduction to 23% of the running time as compared to the unoptimized code. We also show that the memory footprint required by these speedups is modest relative to that of the original version of Percolator.

Place, publisher, year, edition, pages
AMER CHEMICAL SOC, 2019
Keywords
tandem mass spectrometry, machine learning, support vector machine, SVM, percolator
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-261034 (URN)10.1021/acs.jproteome.9b00288 (DOI)000485089100012 ()31407580 (PubMedID)2-s2.0-85071999233 (Scopus ID)
Note

QC 20191002

Available from: 2019-10-02 Created: 2019-10-02 Last updated: 2019-10-02Bibliographically approved
The, M., Edfors, F., Perez-Riverol, Y., Payne, S. H., Hoopmann, M. R., Palmblad, M., . . . Käll, L. (2018). A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms. Journal of Proteome Research, 17(5), 1879-1886
Open this publication in new window or tab >>A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms
Show others...
2018 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 17, no 5, p. 1879-1886Article in journal (Refereed) Published
Abstract [en]

A natural way to benchmark the performance of an analytical experimental setup is to use samples of known measured analytes are peptides and not the actual proteins one of the inherent problems of interpreting data is that the composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.

Place, publisher, year, edition, pages
American Chemical Society (ACS), 2018
Keywords
mass spectrometry, proteomics, protein inference, sample of known content, protein standard, proteofom, peptide, homology, benchmark
National Category
Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:kth:diva-228270 (URN)10.1021/acs.jproteome.7b00899 (DOI)000431726700013 ()29631402 (PubMedID)2-s2.0-85046675818 (Scopus ID)
Note

QC 20180522

Available from: 2018-05-22 Created: 2018-05-22 Last updated: 2018-12-05Bibliographically approved
Jeuken, G. S. & Käll, L. (2018). A simple null model for inferences from network enrichment analysis. PLoS ONE, 13(11), Article ID e0206864.
Open this publication in new window or tab >>A simple null model for inferences from network enrichment analysis
2018 (English)In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 13, no 11, article id e0206864Article in journal (Refereed) Published
Abstract [en]

A prevailing technique to infer function from lists of identifications, from molecular biological high-throughput experiments, is over-representation analysis, where the identifications are compared to predefined sets of related genes often referred to as pathways. As at least some pathways are known to be incomplete in their annotation, algorithmic efforts have been made to complement them with information from functional association networks. While the terminology varies in the literature, we will here refer to such methods as Network Enrichment Analysis (NEA). Traditionally, the significance of inferences from NEA has been assigned using a null model constructed from randomizations of the network. Here we instead argue for a null model that more directly relates to the set of genes being studied, and have designed one dynamic programming algorithm that calculates the score distribution of NEA scores that makes it possible to assign unbiased mid p values to inferences. We also implemented a random sampling method, carrying out the same task. We demonstrate that our method obtains a superior statistical calibration as compared to the popular NEA inference engine, BinoX, while also providing statistics that are easier to interpret.

Place, publisher, year, edition, pages
PUBLIC LIBRARY SCIENCE, 2018
National Category
Genetics
Identifiers
urn:nbn:se:kth:diva-239780 (URN)10.1371/journal.pone.0206864 (DOI)000449772600027 ()30412619 (PubMedID)2-s2.0-85056317407 (Scopus ID)
Note

QC 20190108

Available from: 2019-01-08 Created: 2019-01-08 Last updated: 2019-08-20Bibliographically approved
Lee, J.-Y. -., Choi, H., Colangelo, C. M., Davis, D., Hoopmann, M. R., Käll, L., . . . Palmblad, M. (2018). ABRF Proteome Informatics Research Group (iPRG) 2016 Study: Inferring Proteoforms from Bottom-up Proteomics Data. Journal of biomolecular techniques : JBT, 29(2), 39-45
Open this publication in new window or tab >>ABRF Proteome Informatics Research Group (iPRG) 2016 Study: Inferring Proteoforms from Bottom-up Proteomics Data
Show others...
2018 (English)In: Journal of biomolecular techniques : JBT, ISSN 1943-4731, Vol. 29, no 2, p. 39-45Article in journal (Refereed) Published
Abstract [en]

This report presents the results from the 2016 Association of Biomolecular Resource Facilities Proteome Informatics Research Group (iPRG) study on proteoform inference and false discovery rate (FDR) estimation from bottom-up proteomics data. For this study, 3 replicate Q Exactive Orbitrap liquid chromatography-tandom mass spectrometry datasets were generated from each of 4 Escherichia coli samples spiked with different equimolar mixtures of small recombinant proteins selected to mimic pairs of homologous proteins. Participants were given raw data and a sequence file and asked to identify the proteins and provide estimates on the FDR at the proteoform level. As part of this study, we tested a new submission system with a format validator running on a virtual private server (VPS) and allowed methods to be provided as executable R Markdown or IPython Notebooks. The task was perceived as difficult, and only eight unique submissions were received, although those who participated did well with no one method performing best on all samples. However, none of the submissions included a complete Markdown or Notebook, even though examples were provided. Future iPRG studies need to be more successful in promoting and encouraging participation. The VPS and submission validator easily scale to much larger numbers of participants in these types of studies. The unique "ground-truth" dataset for proteoform identification generated for this study is now available to the research community, as are the server-side scripts for validating and managing submissions.

Place, publisher, year, edition, pages
NLM (Medline), 2018
Keywords
best practice, community study, false discovery rate, inference
National Category
Biological Sciences
Identifiers
urn:nbn:se:kth:diva-247210 (URN)10.7171/jbt.18-2902-003 (DOI)2-s2.0-85059915162 (Scopus ID)
Note

QC 20190415

Available from: 2019-04-15 Created: 2019-04-15 Last updated: 2019-04-15Bibliographically approved
Deutsch, E. W., Perez-Riverol, Y., Chalkley, R. J., Wilhelm, M., Tate, S., Sachsenberg, T., . . . Rost, H. (2018). Expanding the Use of Spectral Libraries in Proteomics. Journal of Proteome Research, 17(12), 4051-4060
Open this publication in new window or tab >>Expanding the Use of Spectral Libraries in Proteomics
Show others...
2018 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 17, no 12, p. 4051-4060Article in journal (Refereed) Published
Abstract [en]

The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on ABSTRACT: The 2017 Dagstuhl Seminar on Computational the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Standards Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. We present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine analysis of proteomics datasets.

Place, publisher, year, edition, pages
American Chemical Society (ACS), 2018
Keywords
mass spectrometry, spectral libraries, standards, formats, Dagstuhl Seminar, meeting report, Proteomics Standards Initiative
National Category
Biochemistry and Molecular Biology
Identifiers
urn:nbn:se:kth:diva-240749 (URN)10.1021/acs.jproteome.8b00485 (DOI)000452930000004 ()30270626 (PubMedID)2-s2.0-85054995934 (Scopus ID)
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscience
Note

QC 20190108

Available from: 2019-01-08 Created: 2019-01-08 Last updated: 2019-01-08Bibliographically approved
Jahn, M., Vialas, V., Karlsen, J., Maddalo, G., Edfors, F., Forsström, B., . . . Hudson, E. P. (2018). Growth of Cyanobacteria Is Constrained by the Abundance of Light and Carbon Assimilation Proteins. Cell reports, 25(2), 478-+
Open this publication in new window or tab >>Growth of Cyanobacteria Is Constrained by the Abundance of Light and Carbon Assimilation Proteins
Show others...
2018 (English)In: Cell reports, ISSN 2211-1247, E-ISSN 2211-1247, Vol. 25, no 2, p. 478-+Article in journal (Refereed) Published
Abstract [en]

Cyanobacteria must balance separate demands for energy generation, carbon assimilation, and biomass synthesis. We used shotgun proteomics to investigate proteome allocation strategies in the model cyanobacterium Synechocystis sp. PCC 6803 as it adapted to light and inorganic carbon (C-i) limitation. When partitioning the proteome into seven functional sectors, we find that sector sizes change linearly with growth rate. The sector encompassing ribosomes is significantly smaller than in E. coli, which may explain the lower maximum growth rate in Synechocystis. Limitation of light dramatically affects multiple proteome sectors, whereas the effect of C-i limitation is weak. Carbon assimilation proteins respond more strongly to changes in light intensity than to C-i. A coarse-grained cell economy model generally explains proteome trends. However, deviations from model predictions suggest that the large proteome sectors for carbon and light assimilation are not optimally utilized under some growth conditions and may constrain the proteome space available to ribosomes.

Place, publisher, year, edition, pages
et al., 2018
National Category
Physical Sciences
Identifiers
urn:nbn:se:kth:diva-237095 (URN)10.1016/j.celrep.2018.09.040 (DOI)000446691400020 ()30304686 (PubMedID)2-s2.0-85054193580 (Scopus ID)
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceSwedish Research Council Formas, 2015-939Swedish Research CouncilSwedish Foundation for Strategic Research , RBP14-0013
Note

QC 20181029

Available from: 2018-10-29 Created: 2018-10-29 Last updated: 2019-10-07Bibliographically approved
Griss, J., Perez-Riverol, Y., The, M., Käll, L. & Vizcaino, J. A. (2018). Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra". Journal of Proteome Research, 17(5), 1993-1996
Open this publication in new window or tab >>Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra"
Show others...
2018 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 17, no 5, p. 1993-1996Article in journal (Refereed) Published
Abstract [en]

In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.

Place, publisher, year, edition, pages
AMER CHEMICAL SOC, 2018
National Category
Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:kth:diva-228271 (URN)10.1021/acs.jproteome.7b00824 (DOI)000431726700024 ()29682973 (PubMedID)2-s2.0-85046629294 (Scopus ID)
Note

QC 20180522

Available from: 2018-05-22 Created: 2018-05-22 Last updated: 2018-05-22Bibliographically approved
Zhang, B., Pirmoradian, M., Zubarev, R. & Käll, L. (2017). Covariation of Peptide Abundances Accurately Reflects Protein Concentration Differences. Molecular & Cellular Proteomics, 16(5), 936-948
Open this publication in new window or tab >>Covariation of Peptide Abundances Accurately Reflects Protein Concentration Differences
2017 (English)In: Molecular & Cellular Proteomics, ISSN 1535-9476, E-ISSN 1535-9484, Vol. 16, no 5, p. 936-948Article in journal (Refereed) Published
Abstract [en]

Most implementations of mass spectrometry-based proteomics involve enzymatic digestion of proteins, expanding the analysis to multiple proteolytic peptides for each protein. Currently, there is no consensus of how to summarize peptides' abundances to protein concentrations, and such efforts are complicated by the fact that error control normally is applied to the identification process, and do not directly control errors linking peptide abundance measures to protein concentration. Peptides resulting from suboptimal digestion or being partially modified are not representative of the protein concentration. Without a mechanism to remove such unrepresentative peptides, their abundance adversely impacts the estimation of their protein's concentration. Here, we present a relative quantification approach, Diffacto, that applies factor analysis to extract the covariation of peptides' abundances. The method enables a weighted geometrical average summarization and automatic elimination of incoherent peptides. We demonstrate, based on a set of controlled label-free experiments using standard mixtures of proteins, that the covariation structure extracted by the factor analysis accurately reflects protein concentrations. In the 1% peptide-spectrum match-level FDR data set, as many as 11% of the peptides have abundance differences incoherent with the other peptides attributed to the same protein. If not controlled, such contradicting peptide abundance have a severe impact on protein quantifications. When adding the quantities of each protein's three most abundant peptides, we note as many as 14% of the proteins being estimated as having a negative correlation with their actual concentration differences between samples. Diffacto reduced the amount of such obviously incorrectly quantified proteins to 1.6%. Furthermore, by analyzing clinical data sets from two breast cancer studies, our method revealed the persistent proteomic signatures linked to three subtypes of breast cancer. We conclude that Diffacto can facilitate the interpretation and enhance the utility of most types of proteomics data.

Place, publisher, year, edition, pages
American Society for Biochemistry and Molecular Biology, 2017
National Category
Biochemistry and Molecular Biology Bioinformatics (Computational Biology) Biophysics
Identifiers
urn:nbn:se:kth:diva-207901 (URN)10.1074/mcp.O117.067728 (DOI)000400759600017 ()28302922 (PubMedID)2-s2.0-85018359335 (Scopus ID)
Note

QC 20170530

Available from: 2017-05-30 Created: 2017-05-30 Last updated: 2018-01-13Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-5689-9797

Search in DiVA

Show all publications