Change search
Refine search result
12 1 - 50 of 52
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Afkham, Heydar Maboudi
    et al.
    KTH, School of Computer Science and Communication (CSC).
    Qiu, Xuanbin
    KTH, School of Computer Science and Communication (CSC).
    The, Matthew
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics2017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 4, 508-513 p.Article in journal (Refereed)
    Abstract [en]

    Motivation: Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time. Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor ELUDE. Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction. Results: In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies.

  • 2. Bendz, Maria
    et al.
    Skwark, Marcin
    Nilsson, Daniel
    Granholm, Viktor
    Cristobal, Susana
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Elofsson, Arne
    Membrane protein shaving with thermolysin can be used to evaluate topology predictors2013In: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 13, no 9, 1467-1480 p.Article in journal (Refereed)
    Abstract [en]

    Topology analysis of membrane proteins can be obtained by enzymatic shaving in combination with MS identification of peptides. Ideally, such analysis could provide quite detailed information about the membrane spanning regions. Here, we examine the ability of some shaving enzymes to provide large-scale analysis of membrane proteome topologies. To compare different shaving enzymes, we first analyzed the detected peptides from two over-expressed proteins. Second, we analyzed the peptides from non-over-expressed Escherichia coli membrane proteins with known structure to evaluate the shaving methods. Finally, the identified peptides were used to test the accuracy of a number of topology predictors. At the end we suggest that the usage of thermolysin, an enzyme working at the natural pH of the cell for membrane shaving, is superior because: (i) we detect a similar number of peptides and proteins using thermolysin and trypsin; (ii) thermolysin shaving can be run at a natural pH and (iii) the incubation time is quite short. (iv) Fewer detected peptides from thermolysin shaving originate from the transmembrane regions. Using thermolysin shaving we can also provide a clear separation between the best and the less accurate topology predictors, indicating that using data from shaving can provide valuable information when developing new topology predictors.

  • 3. Boekel, Jorrit
    et al.
    Chilton, John M
    Cooke, Ira R
    Horvatovich, Peter L
    Jagtap, Pratik D
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology.
    Lehtiö, Janne
    Lukasse, Pieter
    Moerland, Perry D
    Griffin, Timothy J
    Multi-omic data analysis using Galaxy2015In: Nature Biotechnology, ISSN 1087-0156, E-ISSN 1546-1696, Vol. 33, no 2, 137-9 p.Article in journal (Refereed)
  • 4. Branca, Rui M. M.
    et al.
    Orre, Lukas M.
    Johansson, Henrik J.
    Granholm, Viktor
    Huss, Mikael
    Perez-Bercoff, Åsa
    Forshed, Jenny
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Lehtiö, Janne
    HiRIEF LC-MSMS enables deep proteome coverage and unbiased proteogenomics2014In: Nature Methods, ISSN 1548-7091, E-ISSN 1548-7105, Vol. 11, no 1, 59- p.Article in journal (Refereed)
    Abstract [en]

    We present a liquid chromatography-mass spectrometry (LC-MSMS)-based method permitting unbiased (gene prediction-independent) genome-wide discovery of protein-coding loci in higher eukaryotes. Using high-resolution isoelectric focusing (HiRIEF) at the peptide level in the 3.7-5.0 pH range and accurate peptide isoelectric point (pI) prediction, we probed the six-reading-frame translation of the human and mouse genomes and identified 98 and 52 previously undiscovered protein-coding loci, respectively. The method also enabled deep proteome coverage, identifying 13,078 human and 10,637 mouse proteins.

  • 5.
    Edfors, Fredrik
    et al.
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Danielsson, Frida
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Hallström, Björn
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Lundberg, Emma
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Ponten, Fredrik
    Department of Immunology, Genetics and Pathology, Rudbeck Laboratory, Uppsala University, SE-751 85 Uppsala, Sweden.
    Forsström, Björn
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Uhlén, Mathias
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology. KTH, Centres, Science for Life Laboratory, SciLifeLab. Technical University of Denmark, Denmark.
    Gene specific correlation of RNA and protein levels in human cells and tissues2016In: Molecular Systems Biology, ISSN 1744-4292, E-ISSN 1744-4292Article in journal (Refereed)
    Abstract [en]

    An important issue for molecular biology is to establish if transcript levels of a given gene can be used as proxies for the corresponding protein levels. Here, we have developed a targeted proteomics approach for a set of human non-secreted proteins based on Parallel Reaction Monitoring to measure, at steady-state conditions, absolute protein copy numbers across human tissues and cell lines and compared these levels with the corresponding mRNA levels using transcriptomics. The study shows that the transcript and protein levels do not correlate well unless a gene-specific RNA-to-protein (RTP) conversion factor independent of the tissue-type is introduced, thus significantly enhancing the predictability of protein copy numbers from RNA levels. The results show that the RTP-ratio varies significantly with a few hundred copies per mRNA molecule for some genes to several hundred thousands protein copies per mRNA molecule for others. In conclusion, our data suggests that transcriptome analysis can be used as a tool to predict the protein copy numbers per cell, thus forming an attractive link between the field of genomics and proteomics. 

  • 6.
    Edfors, Fredrik
    et al.
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Danielsson, Frida
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Hallström, Björn M.
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Lundberg, Emma
    KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Ponten, Fredrik
    Forsström, Björn
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Uhlén, Mathias
    KTH, School of Biotechnology (BIO), Proteomics and Nanobiotechnology. KTH, Centres, Science for Life Laboratory, SciLifeLab. Technical University of Denmark, Denmark.
    Gene-specific correlation of RNA and protein levels in human cells and tissues2016In: Molecular Systems Biology, ISSN 1744-4292, E-ISSN 1744-4292, Vol. 12, no 10, 883Article in journal (Refereed)
    Abstract [en]

    An important issue for molecular biology is to establish whether transcript levels of a given gene can be used as proxies for the corresponding protein levels. Here, we have developed a targeted proteomics approach for a set of human non-secreted proteins based on parallel reaction monitoring to measure, at steady-state conditions, absolute protein copy numbers across human tissues and cell lines and compared these levels with the corresponding mRNA levels using transcriptomics. The study shows that the transcript and protein levels do not correlate well unless a gene-specific RNA-to-protein (RTP) conversion factor independent of the tissue type is introduced, thus significantly enhancing the predictability of protein copy numbers from RNA levels. The results show that the RTP ratio varies significantly with a few hundred copies per mRNA molecule for some genes to several hundred thousands of protein copies per mRNA molecule for others. In conclusion, our data suggest that transcriptome analysis can be used as a tool to predict the protein copy numbers per cell, thus forming an attractive link between the field of genomics and proteomics.

  • 7.
    Emanuelsson, Olof
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Arvestad, Lars
    KTH, Centres, Science for Life Laboratory, SciLifeLab. Stockholm University.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Engagera och aktivera studenter med inspiration från konferenser: examination genom poster-presentation2014In: Proceedings 2014, 8:e Pedagogiska inspirationskonferensen 17 december 2014 / [ed] Roy Andersson, Lund, 2014Conference paper (Refereed)
    Abstract [sv]

    I en forskningsnära kurs om 7.5 hp på master-nivå inom bioinformatikämnet vid KTH består drygt halva kursen av ett projekt som genomförs i grupper om tre studenter. Varje projekt har en egen projektuppgift med inget eller marginellt överlapp med andra gruppers uppgifter. Projekten är så gott som uteslutande baserade på aktuella frågeställningar i lärarteamets egna forskningsgrupper eller deras närhet. Projektet redovisas dels genom en posterpresentation, dels med individuell webbaserad projektdagbok. Vid posterredovisningen, som omfattar tre timmar i slutet av tentamensperioden, är alla kursdeltagare med. Vi försöker i möjligaste mån efterlikna situationen där ett autentiskt forskningsresultat presenteras på en riktig konferens. Varje deltagare (student) förväntas alltså ta del av varje annan grupps poster, på samma sätt som sker vid de flesta vetenskapliga konferenser. Vi genomför en enklare kamratbedömning på posternivå, där varje student ska avge en kort och konfidentiell kommentar om var och en av övriga postrar. Kursens lärare bedömer förstås också postrarna. En av svårigheterna är att sätta individuella betyg. Här använder vi oss av individuella projektdagböcker, som ger vägledning till de olika individernas insatser inom projektet. Vi har provat detta under fyra kursomgångar med som mest sju projekt. Examinationsformen är rolig och motiverande både för studenterna och lärarna.

  • 8. Granholm, V.
    et al.
    Kim, S.
    Fernandez Navarro, José
    KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Sjölund, E.
    Smith, R. D.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Fast and accurate database searches with MS-GF+percolator2014In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 13, no 2, 890-897 p.Article in journal (Refereed)
    Abstract [en]

    One can interpret fragmentation spectra stemming from peptides in mass-spectrometry-based proteomics experiments using so-called database search engines. Frequently, one also runs post-processors such as Percolator to assess the confidence, infer unique peptides, and increase the number of identifications. A recent search engine, MS-GF+, has shown promising results, due to a new and efficient scoring algorithm. However, MS-GF+ provides few statistical estimates about the peptide-spectrum matches, hence limiting the biological interpretation. Here, we enabled Percolator processing for MS-GF+ output and observed an increased number of identified peptides for a wide variety of data sets. In addition, Percolator directly reports p values and false discovery rate estimates, such as q values and posterior error probabilities, for peptide-spectrum matches, peptides, and proteins, functions that are useful for the whole proteomics community.

  • 9. Granholm, Viktor
    et al.
    Fernandez Navarro, José
    KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Noble, William Stafford
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics2013In: Journal of Proteomics, ISSN 1874-3919, E-ISSN 1876-7737, Vol. 80, 123-131 p.Article in journal (Refereed)
    Abstract [en]

    The analysis of a shotgun proteomics experiment results in a list of peptide-spectrum matches (PSMs) in which each fragmentation spectrum has been matched to a peptide in a database. Subsequently, most protein inference algorithms rank peptides according to the best-scoring PSM for each peptide. However, there is disagreement in the scientific literature on the best method to assess the statistical significance of the resulting peptide identifications. Here, we use a previously described calibration protocol to evaluate the accuracy of three different peptide-level statistical confidence estimation procedures: the classical Fisher's method, and two complementary procedures that estimate significance, respectively, before and after selecting the top-scoring PSM for each spectrum. Our experiments show that the latter method, which is employed by MaxQuant and Percolator, produces the most accurate, well-calibrated results.

  • 10. Granholm, Viktor
    et al.
    Käll, Lukas
    Quality assessments of peptide-spectrum matches in shotgun proteomics2011In: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 11, no 6, 1086-1093 p.Article in journal (Refereed)
    Abstract [en]

    The peptide identification process in shotgun proteomics is most frequently solved with search engines. Such search engines assign scores that reflect similarity between the measured fragmentation spectrum and the theoretical spectra of the peptides of a given database. However, the scores from most search engines do not have a direct statistical interpretation. To understand and make use of the significance of peptide identifications, one must thus be familiar with some statistical concepts. Here, we discuss different statistical scores used to show the confidence of an identification and a set of methods to estimate these scores. We also describe the variance of statistical scores and imperfections of scoring functions of peptide-spectrum matches.

  • 11. Granholm, Viktor
    et al.
    Noble, William Stafford
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    A cross-validation scheme for machine learning algorithms in shotgun proteomics2012In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 13, S3- p.Article in journal (Refereed)
    Abstract [en]

    Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.

  • 12. Granholm, Viktor
    et al.
    Noble, William Stafford
    Käll, Lukas
    On using samples of known protein content to assess the statistical calibration of scores assigned to peptide-spectrum matches in shotgun proteomics2011In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 10, no 5, 2671-2678 p.Article in journal (Refereed)
    Abstract [en]

    In shotgun proteomics, the quality of a hypothesized match between an observed spectrum and a peptide sequence is quantified by a score function. Because the score function lies at the heart of any peptide identification pipeline, this function greatly affects the final results of a proteomics assay. Consequently, valid statistical methods for assessing the quality of a given score function are extremely important. Previously, several research groups have used samples of known protein composition to assess the quality of a given score function. We demonstrate that this approach is problematic, because the outcome can depend on factors other than the score function itself. We then propose an alternative use of the same type of data to validate a score function. The central idea of our approach is that database matches that are not explained by any protein in the purified sample comprise a robust representation of incorrect matches. We apply our alternative assessment scheme to several commonly used score functions, and we show that our approach generates a reproducible measure of the calibration of a given peptide identification method. Furthermore, we show how our quality test can be useful in the development of novel score functions.

  • 13. Henricson, Anna
    et al.
    Käll, Lukas
    Sonnhammer, Erik L. L.
    A novel transmembrane topology of presenilin based on reconciling experimental and computational evidence2005In: The FEBS Journal, ISSN 1742-464X, E-ISSN 1742-4658, Vol. 272, no 11, 2727-2733 p.Article in journal (Refereed)
    Abstract [en]

    The transmembrane topology of presenilins is still the subject of debate despite many experimental topology studies using antibodies or gene fusions. The results from these studies are partly contradictory and consequently several topology models have been proposed. Studies of presenilin-interacting proteins have produced further contradiction, primarily regarding the location of the C-terminus. It is thus impossible to produce a topology model that agrees with all published data on presenilin. We have analyzed the presenilin topology through computational sequence analysis of the presenilin family and the homologous presenilin-like protein family. Members of these families are intramembrane-cleaving aspartyl proteases. Although the overall sequence homology between the two families is low, they share the conserved putative active site residues and the conserved 'PAL' motif. Therefore, the topology model for the presenilin-like proteins can give some clues about the presenilin topology. Here we propose a novel nine-transmembrane topology with the C-terminus in the extracytosolic space. This model has strong support from published data on gamma-secretase function and presenilin topology. Contrary to most presenilin topology models, we show that hydrophobic region X is probably a transmembrane segment. Consequently, the C-terminus would be located in the extracytosolic space. However, the last C-terminal amino acids are relatively hydrophobic and in conjunction with existing experimental data we cannot exclude the possibility that the extreme C-terminus could be buried within the gamma-secretase complex. This might explain the difficulties in obtaining consistent experimental evidence regarding the location of the C-terminal region of presenilin.

  • 14.
    Käll, Lukas
    Department of Biochemistry and Biophysics, Center for Biomembrane Research and Stockholm Bioinformatics Center, Stockholm University.
    Prediction of transmembrane topology and signal peptide given a protein's amino acid sequence2010In: Vol. 673, 53-62 p.Article in journal (Refereed)
    Abstract [en]

    Here, we describe transmembrane topology and signal peptide predictors and highlight their advantages and shortcomings. We also discuss the relation between these two types of prediction.

  • 15.
    Käll, Lukas
    et al.
    Department of Genome Sciences, University of Washington.
    Canterbury, Jesse D.
    Weston, Jason
    Noble, William Stafford
    MacCoss, Michael J.
    Semi-supervised learning for peptide identification from shotgun proteomics datasets2007In: Nature Methods, ISSN 1548-7091, E-ISSN 1548-7105, Vol. 4, no 11, 923-925 p.Article in journal (Refereed)
    Abstract [en]

    Shotgun proteomics uses liquid chromatography-tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach.

  • 16.
    Käll, Lukas
    et al.
    Ctr. for Genomics and Bioinformatics, Karolinska Institutet.
    Krogh, Anders
    Sonnhammer, Erik L. L.
    A combined transmembrane topology and signal peptide prediction method2004In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 338, no 5, 1027-1036 p.Article in journal (Refereed)
    Abstract [en]

    An inherent problem in transmembrane protein topology prediction and signal peptide prediction is the high similarity between the hydrophobic regions of a transmembrane helix and that of a signal peptide, leading to cross-reaction between the two types of predictions. To improve predictions further, it is therefore important to make a predictor that aims to discriminate between the two classes. In addition, topology information can be gained when successfully predicting a signal Peptide leading a trans' membrane protein since it dictates that the N terminus of the mature protein must be on the non-cytoplasmic side of the membrane. Here, we present Phobius, a combined transmembrane protein topology and signal peptide predictor. The predictor is based on a hidden Markov model (HMM) that models the different sequence regions of a signal peptide and the different regions of a transmembrane protein in a series of interconnected states. Training was done on a newly assembled and curated dataset. Compared to TMHMM and SignalP, errors coming from cross-prediction between transmembrane segments and signal peptides were reduced substantially by Phobius. False classifications of signal peptides were reduced from 26.1% to 3.9% and false classifications of transmembrane helices were reduced from 19.0%, to 7.7%. Phobius was applied to the proteomes of Honzo sapiens and Escherichia coli. Here we also noted a drastic reduction of false classifications compared to TMHMM/SignalP, suggesting that Phobius is well suited for whole-genome annotation of signal peptides and transmembrane regions. The method is available at http://phobius.cgb.ki.se/ as well as at http://phobius.binf.ku.dk/.

  • 17.
    Käll, Lukas
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Krogh, Anders
    Sonnhammer, Erik L. L.
    Advantages of combined transmembrane topology and signal peptide prediction - the Phobius web server2007In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 35, no Web Server issue, 1, W429-W432 p.Article in journal (Refereed)
    Abstract [en]

    When using conventional transmembrane topology and signal peptide predictors, such as TMHMM and SignalP, there is a substantial overlap between these two types of predictions. Applying these methods to five complete proteomes, we found that 30-65% of all predicted signal peptides and 25-35% of all predicted transmembrane topologies overlap. This impairs predictions of 5-10% of the proteome, hence this is an important issue in protein annotation. To address this problem, we previously designed a hidden Markov model, Phobius, that combines transmembrane topology and signal peptide predictions. The method makes an optimal choice between transmembrane segments and signal peptides, and also allows constrained and homology-enriched predictions. We here present a web interface (http://phobius.cgb.ki.se and http://phobius.binf.ku.dk) to access Phobius.

  • 18.
    Käll, Lukas
    et al.
    Center for Genomics and Bioinformatics, Karolinska Institutet.
    Krogh, Anders
    Sonnhammer, Erik L. L.
    An HMM posterior decoder for sequence feature prediction that includes homology information2005In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 21, no Suppl.1, i251-i257 p.Article in journal (Refereed)
    Abstract [en]

    Motivation: When predicting sequence features like transmembrane topology, signal peptides, coil-coil structures, protein secondary structure or genes, extra support can be gained from homologs. Results: We present here a general hidden Markov model (HMM) decoding algorithm that combines probabilities for sequence features of homologs by considering the average of the posterior label probability of each position in a global sequence alignment. The algorithm is an extension of the previously described 'optimal accuracy' decoder, allowing homology information to be used. It was benchmarked using an HMM for transmembrane topology and signal peptide prediction, Phobius. We found that the performance was substantially increased when incorporating information from homologs.

  • 19.
    Käll, Lukas
    et al.
    Ctr. for Genomics and Bioinformatics, Karolinska Institutet.
    Sonnhammer, Erik L. L.
    Reliability of transmembrane predictions in whole-genome data2002In: FEBS Letters, ISSN 0014-5793, E-ISSN 1873-3468, Vol. 532, no 3, 415-418 p.Article in journal (Refereed)
    Abstract [en]

    Transmembrane prediction methods are generally benchmarked on a set of proteins with experimentally verified topology. We have investigated if the accuracy measured on such datasets can be expected in an unbiased genomic analysis, or if there is a bias towards 'easily predictable' proteins in the benchmark datasets. As a measurement of accuracy, the concordance of the results from five different prediction methods was used (TMHMM, PHD, HMMTOP, MEMSAT, and TOPPRED). The benchmark dataset showed significantly higher levels (up to five times) of agreement between different methods than in 10 tested genomes. We have also analyzed which programs are most prone to make mispredictions by measuring the frequency of one-out-of-five disagreeing predictions.

  • 20. Käll, Lukas
    et al.
    Storey, John D.
    MacCoss, Michael J.
    Noble, William Stafford
    Assigning significance to peptides identified by tandem mass spectrometry using decoy databases2008In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 7, no 1, 29-34 p.Article in journal (Refereed)
    Abstract [en]

    Automated methods for assigning peptides to observed tandem mass spectra typically return a list of peptide-spectrum matches, ranked according to an arbitrary score. In this article, we describe methods for converting these arbitrary scores into more useful statistical significance measures. These methods employ a decoy sequence database as a model of the null hypothesis, and use false discovery rate (FDR) analysis to correct for multiple testing. We first describe a simple FDR inference method and then describe how estimating and taking into account the percentage of incorrectly identified spectra in the entire data set can lead to increased statistical power.

  • 21. Käll, Lukas
    et al.
    Storey, John D.
    MacCoss, Michael J.
    Noble, William Stafford
    Posterior error probabilities and false discovery rates: two sides of the same coin2008In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 7, no 1, 40-44 p.Article in journal (Refereed)
    Abstract [en]

    A variety of methods have been described in the literature for assigning statistical significance to peptides identified via tandem mass spectrometry. Here, we explain how two types of scores, the q-value and the posterior error probability, are related and complementary to one another.

  • 22. Käll, Lukas
    et al.
    Storey, John D.
    Noble, William Stafford
    Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry2008In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, no 16, i42-i48 p.Article in journal (Refereed)
    Abstract [en]

    Motivation: A mass spectrum produced via tandem mass spectrometry can be tentatively matched to a peptide sequence via database search. Here, we address the problem of assigning a posterior error probability (PEP) to a given peptide-spectrum match (PSM). This problem is considerably more difficult than the related problem of estimating the error rate associated with a large collection of PSMs. Existing methods for estimating PEPs rely on a parametric or semiparametric model of the underlying score distribution. Results: We demonstrate how to apply non-parametric logistic regression to this problem. The method makes no explicit assumptions about the form of the underlying score distribution; instead, the method relies upon decoy PSMs, produced by searching the spectra against a decoy sequence database, to provide a model of the null score distribution. We show that our non-parametric logistic regression method produces accurate PEP estimates for six different commonly used PSM score functions. In particular, the estimates produced by our method are comparable in accuracy to those of PeptideProphet, which uses a parametric or semiparametric model designed specifically to work with SEQUEST. The advantage of the non-parametric approach is applicability and robustness to new score functions and new types of data.

  • 23. Käll, Lukas
    et al.
    Storey, John D.
    Noble, William Stafford
    QVALITY: non-parametric estimation of q-values and posterior error probabilities2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 7, 964-966 p.Article in journal (Refereed)
    Abstract [en]

    Qvality is a C++ program for estimating two types of standard statistical confidence measures: the q-value, which is an analog of the p-value that incorporates multiple testing correction, and the posterior error probability (PEP, also known as the local false discovery rate), which corresponds to the probability that a given observation is drawn from the null distribution. In computing q-values, qvality employs a standard bootstrap procedure to estimate the prior probability of a score being from the null distribution; for PEP estimation, qvality relies upon non-parametric logistic regression. Relative to other tools for estimating statistical confidence measures, qvality is unique in its ability to estimate both types of scores directly from a null distribution, without requiring the user to calculate p-values.

  • 24.
    Käll, Lukas
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Vitek, Olga
    Computational Mass Spectrometry-Based Proteomics2011In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 7, no 12, e1002277- p.Article in journal (Refereed)
  • 25. Lehtio, J.
    et al.
    Branca, M.
    Johansson, H.
    Orre, M.
    Granholm, Viktor
    KTH.
    Forshed, J.
    Perez-Bercoff, M.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Genome Wide Proteomics Using Peptide High Resolution Isoelectric Focusing Hirief-Ms Allows Detection New Human Gene Models2012In: Annals of Oncology, ISSN 0923-7534, E-ISSN 1569-8041, Vol. 23, 33-34 p.Article in journal (Other academic)
  • 26. Lundin, Carolina
    et al.
    Käll, Lukas
    Stockholm Bioinformatics Center, AlbaNova.
    Kreher, Scott A.
    Kapp, Katja
    Sonnhammer, Erik L.
    Carlson, John R.
    Heijne, Gunnar von
    Nilsson, IngMarie
    Membrane topology of the Drosophila OR83b odorant receptor2007In: FEBS Letters, ISSN 0014-5793, E-ISSN 1873-3468, Vol. 581, no 29, 5601-5604 p.Article in journal (Refereed)
    Abstract [en]

    By analogy to mammals, odorant receptors (ORs) in insects, such as Drosophila melanogaster, have long been thought to belong to the G-protein coupled receptor (GPCR) superfamily. However, recent work has cast doubt on this assumption and has tentatively suggested an inverted topology compared to the canonical N(out) - C(in) 7 transmembrane (TM) GPCR topology, at least for some Drosophila ORs. Here, we report a detailed topology mapping of the Drosophila OR83b receptor using engineered glycosylation sites as topology markers. Our results are inconsistent with a classical GPCR topology and show that OR83b has an intracellular N-terminus, an extracellular C-terminus, and 7TM helices.

  • 27. McIlwain, Sean
    et al.
    Tamura, Kaipo
    Kertesz-Farkas, Attila
    Grant, Charles E.
    Diament, Benjamin
    Frewen, Barbara
    Howbert, J. Jeffry
    Hoopmann, Michael R.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Eng, Jimmy K.
    MacCoss, Michael J.
    Noble, William Stafford
    Crux: Rapid Open Source Protein Tandem Mass Spectrometry Analysis2014In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 13, no 10, 4488-4491 p.Article in journal (Refereed)
    Abstract [en]

    Efficiently and accurately analyzing big protein tandem mass spectrometry data sets requires robust software that incorporates state-of-the-art computational, machine learning, and statistical methods. The Crux mass spectrometry analysis software toolkit (http://cruxtoolkit.sourceforge.net) is an open source project that aims to provide users with a cross-platform suite of analysis tools for interpreting protein mass spectrometry data.

  • 28. Merrihew, Gennifer E.
    et al.
    Davis, Colleen
    Ewing, Brent
    Williams, Gary
    Käll, Lukas
    University of Washington, Department of Genome Sciences.
    Frewen, Barbara E.
    Noble, William Stafford
    Green, Phil
    Thomas, James H.
    MacCoss, Michael J.
    Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations2008In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 18, no 10, 1660-1669 p.Article in journal (Refereed)
    Abstract [en]

    We describe a general mass spectrometry-based approach for gene annotation of any organism and demonstrate its effectiveness using the nematode Caenorhabditis elegans. We detected 6779 C. elegans proteins (67,047 peptides), including 384 that, although annotated in WormBase WS150, lacked cDNA or other prior experimental support. We also identified 429 new coding sequences that were unannotated in WS150. Nearly half (192/429) of the new coding sequences were confirmed with RT-PCR data. Thirty-three (approximately 8%) of the new coding sequences had been predicted to be pseudogenes, 151 (approximately 35%) reveal apparent errors in gene models, and 245 (57%) appear to be novel genes. In addition, we verified 6010 exon-exon splice junctions within existing WormBase gene models. Our work confirms that mass spectrometry is a powerful experimental tool for annotating sequenced genomes. In addition, the collection of identified peptides should facilitate future proteomics experiments targeted at specific proteins of interest.

  • 29. Moruz, Luminita
    et al.
    Hoopmann, Michael R.
    Rosenlund, Magnus
    KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Granholm, Viktor
    Moritz, Robert L.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Mass Fingerprinting of Complex Mixtures: Protein Inference from High-Resolution Peptide Masses and Predicted Retention Times2013In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 12, no 12, 5730-5741 p.Article in journal (Refereed)
    Abstract [en]

    In typical shotgun experiments, the mass spectrometer records the masses of a large set of ionized analytes but fragments only a fraction of them. In the subsequent analyses, normally only the fragmented ions are used to compile a set of peptide identifications, while the unfragmented ones are disregarded. In this work, we show how the unfragmented ions, here denoted MS1-features, can be used to increase the confidence of the proteins identified in shotgun experiments. Specifically, we propose the usage of in silico mass tags, where the observed MS1-features are matched against de novo predicted masses and retention times for all peptides derived from a sequence database. We present a statistical model to assign protein-level probabilities based on the MS1-features and combine this data with the fragmentation spectra. Our approach was evaluated for two triplicate data sets from yeast and human, respectively, leading to up to 7% more protein identifications at a fixed protein-level false discovery rate of 1%. The additional protein identifications were validated both in the context of the mass spectrometry data and by examining their estimated transcript levels generated using RNA-Seq. The proposed method is reproducible, straightforward to apply, and can even be used to reanalyze and increase the yield of existing data sets.

  • 30. Moruz, Luminita
    et al.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    GradientOptimizer: An open-source graphical environment for calculating optimized gradients in reversed-phase liquid chromatography2014In: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 14, no 12, 1464-1466 p.Article in journal (Refereed)
    Abstract [en]

    We here present GradientOptimizer, an intuitive, lightweight graphical user interface to design nonlinear gradients for separation of peptides by reversed-phase liquid chromatography. The software allows to calculate three types of nonlinear gradients, each of them optimizing a certain retention time distribution of interest. GradientOptimizer is straightforward to use, requires minimum processing of the input files, and is supported under Windows, Linux, and OS X platforms. The software is open-source and can be downloaded under an Apache 2.0 license at https://github.com/statisticalbiotechnology/NonlinearGradientsUI.

  • 31.
    Moruz, Luminita
    et al.
    KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Peptide retention time prediction2016In: Mass spectrometry reviews (Print), ISSN 0277-7037, E-ISSN 1098-2787Article in journal (Refereed)
    Abstract [en]

    Most methods for interpreting data from shotgun proteomics experiments are to large degree dependent on being able to predict properties of peptide-ions. Often such predicted properties are limited to molecular mass and fragment spectra, but here we put focus on a perhaps underutilized property, a peptide's chromatographic retention time. We review a couple of different principles of retention time prediction,and their applications within computational proteomics.

  • 32. Moruz, Luminita
    et al.
    Pichler, Peter
    Stranzl, Thomas
    Mechtler, Karl
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Optimized Nonlinear Gradients for Reversed-Phase Liquid Chromatography in Shotgun Proteomics2013In: Analytical Chemistry, ISSN 0003-2700, E-ISSN 1520-6882, Vol. 85, no 16, 7777-7785 p.Article in journal (Refereed)
    Abstract [en]

    Reversed-phase liquid chromatography has become the preferred method for separating peptides in most of the mass spectrometry-based proteomics workflows of today. In the way the technique is typically applied, the peptides are released from the chromatography column by the gradual addition of an organic buffer according to a linear function. However, when applied to complex peptide mixtures, this approach leads to unequal spreads of the peptides over the chromatography time. To address this, we investigated the use of nonlinear gradients, customized for each setup at hand. We developed an algorithm to generate optimized gradient functions for shotgun proteomics experiments and evaluated it for two data sets consisting each of four replicate runs of a human complex sample. Our results show that the optimized gradients produce a more even spread of the peptides over the chromatography run, while leading to increased numbers of confident peptide identifications. In addition, the list of peptides identified using nonlinear gradients differed considerably from those found with the linear ones, suggesting that such gradients can be a valuable tool for increasing the proteome coverage of mass spectrometry-based experiments.

  • 33. Moruz, Luminita
    et al.
    Staes, An
    Foster, Joseph M.
    Hatzou, Maria
    Timmerman, Evy
    Martens, Lennart
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Chromatographic retention time prediction for posttranslationally modified peptides2012In: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 12, no 8, 1151-1159 p.Article in journal (Refereed)
    Abstract [en]

    Retention time prediction of peptides in liquid chromatography has proven to be a valuable tool for mass spectrometry-based proteomics, especially in designing more efficient procedures for state-of-the-art targeted workflows. Additionally, accurate retention time predictions can also be used to increase confidence in identifications in shotgun experiments. Despite these obvious benefits, the use of such methods has so far not been extended to (posttranslationally) modified peptides due to the absence of efficient predictors for such peptides. We here therefore describe a new retention time predictor for modified peptides, built on the foundations of our existing Elude algorithm. We evaluated our software by applying it on five types of commonly encountered modifications. Our results show that Elude now yields equally good prediction performances for modified and unmodified peptides, with correlation coefficients between predicted and observed retention times ranging from 0.93 to 0.98 for all the investigated datasets. Furthermore, we show that our predictor handles peptides carrying multiple modifications as well. This latest version of Elude is fully portable to new chromatographic conditions and can readily be applied to other types of posttranslational modifications. Elude is available under the permissive Apache2 open source License at or can be run via a web-interface at.

  • 34. Moruz, Luminita
    et al.
    Tomazela, Daniela
    Käll, Lukas
    Center for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University; Department of Genome Sciences, University of Washington, Seattle.
    Training, selection, and robust calibration of retention time models for targeted proteomics2010In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 9, no 10, 5209-5216 p.Article in journal (Refereed)
    Abstract [en]

    Accurate predictions of peptide retention times (RT) in liquid chromatography have many applications in mass spectrometry-based proteomics. Most notably such predictions are used to weed out incorrect peptide-spectrum matches, and to design targeted proteomics experiments. In this study, we describe a RT predictor, ELUDE, which can be employed in both applications. ELUDE's predictions are based on 60 features derived from the peptide's amino acid composition and optimally combined using kernel regression. When sufficient data is available, ELUDE derives a retention time index for the condition at hand making it fully portable to new chromatographic conditions. In cases when little training data is available, as often is the case in targeted proteomics experiments, ELUDE selects and calibrates a model from a library of pretrained predictors. Both model selection and calibration are carried out via robust statistical methods and thus ELUDE can handle situations where the calibration data contains erroneous data points. We benchmarked our method against two state-of-the-art predictors and showed that ELUDE outperforms these methods and tracked up to 34% more peptides in a theoretical SRM method creation experiment. ELUDE is freely available under Apache License from http://per-colator.com.

  • 35. Park, Christopher Y.
    et al.
    Klammer, Aaron A.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology.
    MacCoss, Michael J.
    Noble, William S.
    Rapid and accurate peptide identification from tandem mass spectra2008In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 7, no 7, 3022-3027 p.Article in journal (Refereed)
    Abstract [en]

    Mass spectrometry, the core technology in the field of proteomics, promises to enable scientists to identify and quantify the entire complement of proteins in a complex biological sample. Currently, the primary bottleneck in this type of experiment is computational. Existing algorithms for interpreting mass spectra are slow and fail to identify a large proportion of the given spectra. We describe a database search program called Crux that reimplements and extends the widely used database search program Sequest. For speed, Crux uses a peptide indexing scheme to rapidly retrieve candidate peptides for a given spectrum. For each peptide in the target database, Crux generates shuffled decoy peptides on the fly, providing a good null model and, hence, accurate false discovery rate estimates. Crux also implements two recently described postprocessing methods: a p value calculation based upon fitting a Weibull distribution to the observed scores, and a semisupervised method that learns to discriminate between target and decoy matches. Both methods significantly improve the overall rate of peptide identification. Crux is implemented in C and is distributed with source code freely to noncommercial users.

  • 36. Reynolds, Sheila M.
    et al.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology.
    Riffle, Michael E.
    Bilmes, Jeff A.
    Noble, William Stafford
    Transmembrane topology and signal peptide prediction using dynamic bayesian networks2008In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 4, no 11, e1000213- p.Article in journal (Refereed)
    Abstract [en]

    Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by a previously published HMM, Phobius, and combines a signal peptide submodel with a transmembrane submodel. We introduce a two-stage DBN decoder that combines the power of posterior decoding with the grammar constraints of Viterbi-style decoding. Philius also provides protein type, segment, and topology confidence metrics to aid in the interpretation of the predictions. We report a relative improvement of 13% over Phobius in full-topology prediction accuracy on transmembrane proteins, and a sensitivity and specificity of 0.96 in detecting signal peptides. We also show that our confidence metrics correlate well with the observed precision. In addition, we have made predictions on all 6.3 million proteins in the Yeast Resource Center (YRC) database. This large-scale study provides an overall picture of the relative numbers of proteins that include a signal-peptide and/or one or more transmembrane segments as well as a valuable resource for the scientific community. All DBNs are implemented using the Graphical Models Toolkit. Source code for the models described here is available at http://noble.gs.washington.edu/proj/philius. A Philius Web server is available at http://www.yeastrc.org/philius, and the predictions on the YRC database are available at http://www.yeastrc.org/pdr.

  • 37. Serang, O.
    et al.
    Cansizoglu, A. E.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Steen, H.
    Steen, J. A.
    Nonparametric bayesian evaluation of differential protein quantification2013In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 12, no 10, 4556-4565 p.Article in journal (Refereed)
    Abstract [en]

    Arbitrary cutoffs are ubiquitous in quantitative computational proteomics: maximum acceptable MS/MS PSM or peptide q value, minimum ion intensity to calculate a fold change, the minimum number of peptides that must be available to trust the estimated protein fold change (or the minimum number of PSMs that must be available to trust the estimated peptide fold change), and the "significant" fold change cutoff. Here we introduce a novel experimental setup and nonparametric Bayesian algorithm for determining the statistical quality of a proposed differential set of proteins or peptides. By comparing putatively nonchanging case-control evidence to an empirical null distribution derived from a control-control experiment, we successfully avoid some of these common parameters. We then apply our method to evaluating different fold-change rules and find that for our data a 1.2-fold change is the most permissive of the plausible fold-change rules.

  • 38. Serang, Oliver
    et al.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Solution to Statistical Challenges in Proteomics Is More Statistics, Not Less2015In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 14, no 10, 4099-4103 p.Article in journal (Other academic)
    Abstract [en]

    In any high-throughput scientific study, it is often essential to estimate the percent of findings that are actually incorrect. This percentage is called the false discovery rate (abbreviated "FDR"), and it is an invariant (albeit, often unknown) quantity for any well-formed study. In proteomics, it has become common practice to incorrectly conflate the protein FDR (the percent of identified proteins that are actually absent) with protein-level target-decoy, a particular method for estimating the protein-level FDR. In this manner, the challenges of one approach have been used as the basis for an argument that the field should abstain from protein-level FDR analysis altogether or even the suggestion that the very notion of a protein FDR is flawed. As we demonstrate in simple but accurate simulations, not only is the protein-level FDR an invariant concept, when analyzing large data sets, the failure to properly acknowledge it or to correct for multiple testing can result in large, unrecognized errors, whereby thousands of absent proteins (and, potentially every protein in the FASTA database being considered) can be incorrectly identified.

  • 39. Serang, Oliver
    et al.
    Moruz, Luminita
    Hoopmann, Michael R.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Recognizing Uncertainty Increases Robustness and Reproducibility of Mass Spectrometry-based Protein Inferences2012In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 11, no 12, 5586-5591 p.Article in journal (Refereed)
    Abstract [en]

    Parsimony and protein grouping are widely employed to enforce economy in the number of identified proteins, with the goal of increasing the quality and reliability of protein identifications; however, in a counterintuitive manner, parsimony and protein grouping may actually decrease the reproducibility and interpretability of protein identifications. We present a simple illustration demonstrating ways in which parsimony and protein grouping may lower the reproducibility or interpretability of results. We then provide an example of a data set where a probabilistic method increases the reproducibility and interpretability of identifications made on replicate analyses of Human Du145 prostate cancer cell lines.

  • 40. Spivak, Marina
    et al.
    Weston, Jason
    Bottou, Léon
    Käll, Lukas
    Noble, William Stafford
    Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets2009In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 8, no 7, 3737-3745 p.Article in journal (Refereed)
    Abstract [en]

    Shotgun proteomics coupled with database search software allows the identification of a large number of peptides in a single experiment. However, some existing search algorithms, such as SEQUEST, use score functions that are designed primarily to identify the best peptide for a given spectrum. Consequently, when comparing identifications across spectra, the SEQUEST score function Xcorr fails to discriminate accurately between correct and incorrect peptide identifications. Several machine learning methods have been proposed to address the resulting classification task of distinguishing between correct and incorrect peptide-spectrum matches (PSMs). A recent example is Percolator, which uses semisupervised learning and a decoy database search strategy to learn to distinguish between correct and incorrect PSMs identified by a database search algorithm. The current work describes three improvements to Percolator. (1) Percolator's heuristic optimization is replaced with a clear objective function, with intuitive reasons behind its choice. (2) Tractable nonlinear models are used instead of linear models, leading to improved accuracy over the original Percolator. (3) A method, Q-ranker, for directly optimizing the number of identified spectra at a specified q value is proposed, which achieves further gains.

  • 41.
    The, Matthew
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    MaRaCluster: A Fragment Rarity Metric for Clustering Fragment Spectra in Shotgun Proteomics2016In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 15, no 3, 713-720 p.Article in journal (Refereed)
    Abstract [en]

    Shotgun proteomics experiments generate large amounts of fragment spectra as primary data, normally with high redundancy between and within experiments. Here, we have devised a clustering technique to identify fragment spectra stemming from the same species of peptide. This is a powerful alternative method to traditional search engines for analyzing spectra, specifically useful for larger scale mass spectrometry studies. As an aid in this process, we propose a distance calculation relying on the rarity of experimental fragment peaks, following the intuition that peaks shared by only a few spectra offer more evidence than peaks shared by a large number of spectra. We used this distance calculation and a complete-linkage scheme to cluster data from a recent large-scale mass spectrometry-based study. The clusterings produced by our method have up to 40% more identified peptides for their consensus spectra compared to those produced by the previous state-of-the-art method. We see that our method would advance the construction of spectral libraries as well as serve as a tool for mining large sets of fragment spectra. The source code and Ubuntu binary packages are available at https://github.com/ statisticalbiotechnology/maracluster (under an Apache 2.0 license).

  • 42.
    The, Matthew
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    MacCoss, M. J.
    Noble, W. S.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.02016In: Journal of the American Society for Mass Spectrometry, ISSN 1044-0305, E-ISSN 1879-1123, Vol. 27, no 11, 1719-1727 p.Article in journal (Refereed)
    Abstract [en]

    Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches (PSMs) from such experiments. Percolator’s processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method—grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein—in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/ under an Apache 2.0 license. [Figure not available: see fulltext.]

  • 43.
    The, Matthew
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology.
    MacCoss, Michael J.
    Noble, William S.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology.
    Fast and accurate protein false discovery rates on large-scale proteomics data sets with Percolator 3.0Manuscript (preprint) (Other academic)
    Abstract [en]

    Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches (PSMs) from such experiments. Percolator's processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore,with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method - grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein - in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542).

    The source code and Ubuntu, Windows, MacOS and Fedora binary packages are available from http://percolator.ms/ under an Apache 2.0 license.

  • 44.
    The, Matthew
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology.
    Tasnim, Ayesha
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology.
    How to talk about protein-level false discovery rates in shotgun proteomicsManuscript (preprint) (Other academic)
    Abstract [en]

    A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate. Many researchers consider protein-level false discovery rates a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level false discovery rates, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the false discovery rate. Furthermore, we demonstrate how the same simulations can be used to verify false discovery rate estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level false discovery rates for both competing null hypotheses.

  • 45.
    The, Matthew
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology.
    Tasnim, Ayesha
    KTH, School of Biotechnology (BIO), Gene Technology.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology.
    How to talk about protein-level false discovery rates in shotgun proteomics2016In: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 16, no 18, 2461-2469 p.Article in journal (Refereed)
    Abstract [en]

    A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein-level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level FDRs for both competing null hypotheses.

  • 46. Ting, Ying S.
    et al.
    Egertson, Jarrett D.
    Payne, Samuel H.
    Kim, Sangtae
    MacLean, Brendan
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Aebersold, Ruedi
    Smith, Richard D.
    Noble, William Stafford
    MacCoss, Michael J.
    Peptide-Centric Proteome Analysis: An Alternative Strategy for the Analysis of Tandem Mass Spectrometry Data2015In: Molecular & Cellular Proteomics, ISSN 1535-9476, E-ISSN 1535-9484, Vol. 14, no 9, 2301-2307 p.Article, review/survey (Refereed)
    Abstract [en]

    In mass spectrometry-based bottom-up proteomics, data-independent acquisition is an emerging technique because of its comprehensive and unbiased sampling of precursor ions. However, current data-independent acquisition methods use wide precursor isolation windows, resulting in cofragmentation and complex mixture spectra. Thus, conventional database searching tools that identify peptides by interpreting individual tandem MS spectra are inherently limited in analyzing data-independent acquisition data. Here we discuss an alternative approach, peptide-centric analysis, which tests directly for the presence and absence of query peptides. We discuss how peptide-centric analysis resolves some limitations of traditional spectrum-centric analysis, and we outline the unique characteristics of peptide-centric analysis in general.

  • 47. Wen, Bo
    et al.
    Du, Chaoqin
    Li, Guilin
    Ghali, Fawaz
    Jones, Andrew R.
    Käll, Lukas
    KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Xu, Shaohang
    Zhou, Ruo
    Ren, Zhe
    Feng, Qiang
    Xu, Xun
    Wang, Jun
    IPeak: An open source tool to combine results from multiple MS/MS search engines2015In: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 15, no 17, 2916-2920 p.Article in journal (Refereed)
    Abstract [en]

    Liquid chromatography coupled tandem mass spectrometry (LC-MS/MS) is an important technique for detecting peptides in proteomics studies. Here, we present an open source software tool, termed IPeak, a peptide identification pipeline that is designed to combine the Percolator post-processing algorithm and multi-search strategy to enhance the sensitivity of peptide identifications without compromising accuracy. IPeak provides a graphical user interface (GUI) as well as a command-line interface, which is implemented in JAVA and can work on all three major operating system platforms: Windows, Linux/Unix and OS X. IPeak has been designed to work with the mzIdentML standard from the Proteomics Standards Initiative (PSI) as an input and output, and also been fully integrated into the associated mzidLibrary project, providing access to the overall pipeline, as well as modules for calling Percolator on individual search engine result files. The integration thus enables IPeak (and Percolator) to be used in conjunction with any software packages implementing the mzIdentML data standard. IPeak is freely available and can be downloaded under an Apache 2.0 license at https://code.google.com/p/mzidentml-lib/.

  • 48. Wistrand, Markus
    et al.
    Käll, Lukas
    Sonnhammer, Erik L. L.
    A general model of G protein-coupled receptor sequences and its application to detect remote homologs2006In: Protein Science, ISSN 0961-8368, E-ISSN 1469-896X, Vol. 15, no 3, 509-521 p.Article in journal (Refereed)
    Abstract [en]

    G protein-coupled receptors (GPCRs) constitute a large superfamily involved in various types of signal transduction pathways triggered by hormones, odorants, peptides, proteins, and other types of ligands. The superfamily is so diverse that many members lack sequence similarity, although they all span the cell membrane seven times with an extracellular N and a cytosolic C terminus. We analyzed a divergent set of GPCRs and found distinct loop length patterns and differences in amino acid composition between cytosolic loops, extracellular loops, and membrane regions. We configured GPCRHMM, a hidden Markov model, to fit those features and trained it on a large dataset representing the entire superfamily. GPCRHMM was benchmarked to profile HMMs and generic transmembrane detectors on sets of known GPCRs and non-GPCRs. In a cross-validation procedure, profile HMMs produced an error rate nearly twice as high as GPCRHMM. In a sensitivity-selectivity test, GPCRHMM's sensitivity was about 15% higher than that of the best transmembrane predictors, at comparable false positive rates. We used GPCRHMM to search for novel members of the GPCR superfamily in five proteomes. All in all we detected 120 sequences that lacked annotation and are potentially novel GPCRs. Out of those 102 were found in Caenorhabditis elegans, four in human, and seven in mouse. Many predictions (65) belonged to Pfam domains of unknown function. GPCRHMM strongly rejected a family of arthropod-specific odorant receptors believed to be GPCRs. A detailed analysis showed that these sequences are indeed very different from other GPCRs. GPCRHMM is available at http://gpcrhmm.cgb.ki.se.

  • 49. Wright, J. C.
    et al.
    Collins, M. O.
    Yu, L.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology. KTH, Centres, Science for Life Laboratory, SciLifeLab.
    Brosch, M.
    Choudhary, J. S.
    Enhanced peptide identification by electron transfer dissociation using an improved mascot percolator2012In: Molecular & Cellular Proteomics, ISSN 1535-9476, E-ISSN 1535-9484, Vol. 11, no 8, 478-491 p.Article in journal (Refereed)
    Abstract [en]

    Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a postsearch algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.

  • 50. Yosef, Nir
    et al.
    Käll, Lukas
    KTH, School of Biotechnology (BIO), Gene Technology.
    From sequence to structure to networks2008In: Genome Biology, ISSN 1465-6906, E-ISSN 1474-760X, Vol. 9, no 11Article in journal (Refereed)
    Abstract [en]

    A report on the 7th European Conference on Computational Biology (ECCB), Cagliari, Italy, 22-26 September 2008.

12 1 - 50 of 52
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf