Endre søk
Begrens søket
12 1 - 50 of 63
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Afkham, Heydar Maboudi
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    Qiu, Xuanbin
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    The, Matthew
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics2017Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, nr 4, s. 508-513Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Motivation: Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time. Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor ELUDE. Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction. Results: In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies.

  • 2. Bendz, Maria
    et al.
    Skwark, Marcin
    Nilsson, Daniel
    Granholm, Viktor
    Cristobal, Susana
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Elofsson, Arne
    Membrane protein shaving with thermolysin can be used to evaluate topology predictors2013Inngår i: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 13, nr 9, s. 1467-1480Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Topology analysis of membrane proteins can be obtained by enzymatic shaving in combination with MS identification of peptides. Ideally, such analysis could provide quite detailed information about the membrane spanning regions. Here, we examine the ability of some shaving enzymes to provide large-scale analysis of membrane proteome topologies. To compare different shaving enzymes, we first analyzed the detected peptides from two over-expressed proteins. Second, we analyzed the peptides from non-over-expressed Escherichia coli membrane proteins with known structure to evaluate the shaving methods. Finally, the identified peptides were used to test the accuracy of a number of topology predictors. At the end we suggest that the usage of thermolysin, an enzyme working at the natural pH of the cell for membrane shaving, is superior because: (i) we detect a similar number of peptides and proteins using thermolysin and trypsin; (ii) thermolysin shaving can be run at a natural pH and (iii) the incubation time is quite short. (iv) Fewer detected peptides from thermolysin shaving originate from the transmembrane regions. Using thermolysin shaving we can also provide a clear separation between the best and the less accurate topology predictors, indicating that using data from shaving can provide valuable information when developing new topology predictors.

  • 3. Boekel, Jorrit
    et al.
    Chilton, John M
    Cooke, Ira R
    Horvatovich, Peter L
    Jagtap, Pratik D
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Lehtiö, Janne
    Lukasse, Pieter
    Moerland, Perry D
    Griffin, Timothy J
    Multi-omic data analysis using Galaxy2015Inngår i: Nature Biotechnology, ISSN 1087-0156, E-ISSN 1546-1696, Vol. 33, nr 2, s. 137-9Artikkel i tidsskrift (Fagfellevurdert)
  • 4. Branca, Rui M. M.
    et al.
    Orre, Lukas M.
    Johansson, Henrik J.
    Granholm, Viktor
    Huss, Mikael
    Perez-Bercoff, Åsa
    Forshed, Jenny
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Lehtiö, Janne
    HiRIEF LC-MSMS enables deep proteome coverage and unbiased proteogenomics2014Inngår i: Nature Methods, ISSN 1548-7091, E-ISSN 1548-7105, Vol. 11, nr 1, s. 59-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present a liquid chromatography-mass spectrometry (LC-MSMS)-based method permitting unbiased (gene prediction-independent) genome-wide discovery of protein-coding loci in higher eukaryotes. Using high-resolution isoelectric focusing (HiRIEF) at the peptide level in the 3.7-5.0 pH range and accurate peptide isoelectric point (pI) prediction, we probed the six-reading-frame translation of the human and mouse genomes and identified 98 and 52 previously undiscovered protein-coding loci, respectively. The method also enabled deep proteome coverage, identifying 13,078 human and 10,637 mouse proteins.

  • 5.
    Chalabi, Morteza H.
    et al.
    Univ Southern Denmark, Dept Biochem & Mol Biol, Campusvej 55, DK-5230 Odense M, Denmark.;Univ Southern Denmark, VILLUM Ctr Bioanalyt Sci, Campusvej 55, DK-5230 Odense M, Denmark..
    Tsiamis, Vasileios
    Univ Southern Denmark, Dept Biochem & Mol Biol, Campusvej 55, DK-5230 Odense M, Denmark.;Univ Southern Denmark, VILLUM Ctr Bioanalyt Sci, Campusvej 55, DK-5230 Odense M, Denmark..
    Käll, Lukas
    KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Skolan för kemi, bioteknologi och hälsa (CBH). Royal Inst Technol, Sch Biotechnol, KTH Sci Life Lab, Solna, Sweden..
    Vandin, Fabio
    Univ Padua, Dept Informat Engn, Padua, Italy..
    Schwammle, Veit
    Univ Southern Denmark, Dept Biochem & Mol Biol, Campusvej 55, DK-5230 Odense M, Denmark.;Univ Southern Denmark, VILLUM Ctr Bioanalyt Sci, Campusvej 55, DK-5230 Odense M, Denmark..
    CoExpresso: assess the quantitative behavior of protein complexes in human cells2019Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 20, artikkel-id 17Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BackgroundTranslational and post-translational control mechanisms in the cell result in widely observable differences between measured gene transcription and protein abundances. Herein, protein complexes are among the most tightly controlled entities by selective degradation of their individual proteins. They furthermore act as control hubs that regulate highly important processes in the cell and exhibit a high functional diversity due to their ability to change their composition and their structure. Better understanding and prediction of these functional states demands methods for the characterization of complex composition, behavior, and abundance across multiple cell states. Mass spectrometry provides an unbiased approach to directly determine protein abundances across different cell populations and thus to profile a comprehensive abundance map of proteins.ResultsWe provide a tool to investigate the behavior of protein subunits in known complexes by comparing their abundance profiles across up to 140 cell types available in ProteomicsDB. Thorough assessment of different randomization methods and statistical scoring algorithms allows determining the significance of concurrent profiles within a complex, therefore providing insights into the conservation of their composition across human cell types as well as the identification of intrinsic structures in complex behavior to determine which proteins orchestrate complex function. This analysis can be extended to investigate common profiles within arbitrary protein groups. CoExpresso can be accessed through http://computproteomics.bmb.sdu.dk/Apps/CoExpresso.ConclusionsWith the CoExpresso web service, we offer a potent scoring scheme to assess proteins for their co-regulation and thereby offer insight into their potential for forming functional groups like protein complexes.

  • 6.
    Deutsch, Eric W.
    et al.
    Inst Syst Biol, Seattle, WA 98109 USA..
    Perez-Riverol, Yasset
    European Bioinformat Inst, European Mol Biol Lab, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England..
    Chalkley, Robert J.
    Univ Calif San Francisco, San Francisco, CA 94158 USA..
    Wilhelm, Mathias
    Tech Univ Munich, Prote & Bioanalyt, D-85354 Freising Weihenstephan, Germany..
    Tate, Stephen
    SCIEX Ltd, Concord, ON L4K4 V8, Canada..
    Sachsenberg, Timo
    Univ Tubingen, Ctr Bioinformat, Dept Comp Sci, Sand 14, D-72076 Tubingen, Germany..
    Walzer, Mathias
    European Bioinformat Inst, European Mol Biol Lab, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England..
    Käll, Lukas
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Delanghe, Bernard
    Thermo Fisher Sci Bremen, Hanna Kunath Str 11, D-28199 Bremen, Germany..
    Boecker, Sebastian
    Friedrich Schiller Univ Jena, Bioinformat, D-07743 Jena, Germany..
    Schymanski, Emma L.
    Univ Luxembourg, Luxembourg Ctr Syst Biomed, 6 Ave Swing, L-4367 Belvaux, Luxembourg..
    Wilmes, Paul
    Univ Luxembourg, Luxembourg Ctr Syst Biomed, 6 Ave Swing, L-4367 Belvaux, Luxembourg..
    Dorfer, Viktoria
    Univ Appl Sci Upper Austria, Bioinformat Res Grp, A-4232 Hagenberg, Austria..
    Kuster, Bernhard
    Tech Univ Munich, Prote & Bioanalyt, D-85354 Freising Weihenstephan, Germany.;Tech Univ Munich, Bavarian Biomol Mass Spectrometry Ctr, D-85354 Freising Weihenstephan, Germany..
    Volders, Pieter-Jan
    Univ Gent VIB, Ctr Med Biotechnol, B-9000 Ghent, Belgium..
    Jehmlich, Nico
    UFZ Helmholtz Ctr Environm Res, D-04318 Leipzig, Germany..
    Vissers, Johannes P. C.
    Waters Corp, Wilmslow SK9 4AX, Cheshire, England..
    Wolan, Dennis W.
    Scripps Res Inst, Dept Mol Med, La Jolla, CA 92037 USA..
    Wang, Ana Y.
    Scripps Res Inst, Dept Mol Med, La Jolla, CA 92037 USA..
    Mendoza, Luis
    Inst Syst Biol, Seattle, WA 98109 USA..
    Shofstahl, Jim
    Thermo Fisher Sci, 355 River Oaks Pkwy, San Jose, CA 95134 USA..
    Dowsey, Andrew W.
    Univ Bristol, Dept Populat Hlth Sci, Fac Hlth Sci, Bristol BS9 1BN, Avon, England.;Univ Bristol, Bristol Vet Sch, Fac Hlth Sci, Bristol BS9 1BN, Avon, England..
    Griss, Johannes
    Med Univ Vienna, Div Immunol Allergy & Infect Dis, Dept Dermatol, Wahringer Gurtel 18-20, A-1090 Vienna, Austria..
    Salek, Reza M.
    Int Agcy Res Canc, 150 Cours Albert Thomas, F-69372 Lyon 08, France..
    Neumann, Steffen
    Leibniz Inst Plant Biochem, Dept Stress & Dev Biol, D-06120 Halle, Germany.;German Ctr Integrat Biodivers Res IDiv, D-04103 Leipzig, Germany..
    Binz, Pierre-Alain
    CHU Vaudois, Clin Chem Serv, CH-1011 Lausanne, Switzerland..
    Lam, Henry
    Hong Kong Univ Sci & Technol, Dept Chem & Biol Engn, Clear Water Bay, Hong Kong 999077, Peoples R China..
    Vizcaino, Juan Antonio
    European Bioinformat Inst, European Mol Biol Lab, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England..
    Bandeira, Nuno
    Univ Calif San Diego, Skaggs Sch Pharm & Pharmaceut Sci, Dept Comp Sci & Engn, Ctr Computat Mass Spectrometry, San Diego, CA 92093 USA..
    Rost, Hannes
    Univ Toronto, Donnelly Ctr, 160 Coll St, Toronto, ON M5S 3E1, Canada..
    Expanding the Use of Spectral Libraries in Proteomics2018Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 17, nr 12, s. 4051-4060Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on ABSTRACT: The 2017 Dagstuhl Seminar on Computational the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Standards Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. We present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine analysis of proteomics datasets.

  • 7.
    Edfors, Fredrik
    et al.
    KTH, Skolan för bioteknologi (BIO), Proteomik och nanobioteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Danielsson, Frida
    KTH, Skolan för bioteknologi (BIO), Proteomik och nanobioteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Hallström, Björn
    KTH, Skolan för bioteknologi (BIO), Proteomik och nanobioteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Proteomik och nanobioteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Lundberg, Emma
    KTH, Skolan för bioteknologi (BIO), Proteomik och nanobioteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Ponten, Fredrik
    Department of Immunology, Genetics and Pathology, Rudbeck Laboratory, Uppsala University, SE-751 85 Uppsala, Sweden.
    Forsström, Björn
    KTH, Skolan för bioteknologi (BIO), Proteomik och nanobioteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Uhlén, Mathias
    KTH, Skolan för bioteknologi (BIO), Proteomik och nanobioteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab. Technical University of Denmark, Denmark.
    Gene specific correlation of RNA and protein levels in human cells and tissues2016Inngår i: Molecular Systems Biology, ISSN 1744-4292, E-ISSN 1744-4292Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    An important issue for molecular biology is to establish if transcript levels of a given gene can be used as proxies for the corresponding protein levels. Here, we have developed a targeted proteomics approach for a set of human non-secreted proteins based on Parallel Reaction Monitoring to measure, at steady-state conditions, absolute protein copy numbers across human tissues and cell lines and compared these levels with the corresponding mRNA levels using transcriptomics. The study shows that the transcript and protein levels do not correlate well unless a gene-specific RNA-to-protein (RTP) conversion factor independent of the tissue-type is introduced, thus significantly enhancing the predictability of protein copy numbers from RNA levels. The results show that the RTP-ratio varies significantly with a few hundred copies per mRNA molecule for some genes to several hundred thousands protein copies per mRNA molecule for others. In conclusion, our data suggests that transcriptome analysis can be used as a tool to predict the protein copy numbers per cell, thus forming an attractive link between the field of genomics and proteomics. 

  • 8.
    Edfors, Fredrik
    et al.
    KTH, Skolan för bioteknologi (BIO), Proteomik och nanobioteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Danielsson, Frida
    KTH, Skolan för bioteknologi (BIO), Proteomik och nanobioteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Hallström, Björn M.
    KTH, Skolan för bioteknologi (BIO), Proteomik och nanobioteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Lundberg, Emma
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Ponten, Fredrik
    Forsström, Björn
    KTH, Skolan för bioteknologi (BIO), Proteomik och nanobioteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Uhlén, Mathias
    KTH, Skolan för bioteknologi (BIO), Proteomik och nanobioteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab. Technical University of Denmark, Denmark.
    Gene-specific correlation of RNA and protein levels in human cells and tissues2016Inngår i: Molecular Systems Biology, ISSN 1744-4292, E-ISSN 1744-4292, Vol. 12, nr 10, artikkel-id 883Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    An important issue for molecular biology is to establish whether transcript levels of a given gene can be used as proxies for the corresponding protein levels. Here, we have developed a targeted proteomics approach for a set of human non-secreted proteins based on parallel reaction monitoring to measure, at steady-state conditions, absolute protein copy numbers across human tissues and cell lines and compared these levels with the corresponding mRNA levels using transcriptomics. The study shows that the transcript and protein levels do not correlate well unless a gene-specific RNA-to-protein (RTP) conversion factor independent of the tissue type is introduced, thus significantly enhancing the predictability of protein copy numbers from RNA levels. The results show that the RTP ratio varies significantly with a few hundred copies per mRNA molecule for some genes to several hundred thousands of protein copies per mRNA molecule for others. In conclusion, our data suggest that transcriptome analysis can be used as a tool to predict the protein copy numbers per cell, thus forming an attractive link between the field of genomics and proteomics.

  • 9.
    Emanuelsson, Olof
    et al.
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Arvestad, Lars
    KTH, Centra, Science for Life Laboratory, SciLifeLab. Stockholm University.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Engagera och aktivera studenter med inspiration från konferenser: examination genom poster-presentation2014Inngår i: Proceedings 2014, 8:e Pedagogiska inspirationskonferensen 17 december 2014 / [ed] Roy Andersson, Lund, 2014Konferansepaper (Fagfellevurdert)
    Abstract [sv]

    I en forskningsnära kurs om 7.5 hp på master-nivå inom bioinformatikämnet vid KTH består drygt halva kursen av ett projekt som genomförs i grupper om tre studenter. Varje projekt har en egen projektuppgift med inget eller marginellt överlapp med andra gruppers uppgifter. Projekten är så gott som uteslutande baserade på aktuella frågeställningar i lärarteamets egna forskningsgrupper eller deras närhet. Projektet redovisas dels genom en posterpresentation, dels med individuell webbaserad projektdagbok. Vid posterredovisningen, som omfattar tre timmar i slutet av tentamensperioden, är alla kursdeltagare med. Vi försöker i möjligaste mån efterlikna situationen där ett autentiskt forskningsresultat presenteras på en riktig konferens. Varje deltagare (student) förväntas alltså ta del av varje annan grupps poster, på samma sätt som sker vid de flesta vetenskapliga konferenser. Vi genomför en enklare kamratbedömning på posternivå, där varje student ska avge en kort och konfidentiell kommentar om var och en av övriga postrar. Kursens lärare bedömer förstås också postrarna. En av svårigheterna är att sätta individuella betyg. Här använder vi oss av individuella projektdagböcker, som ger vägledning till de olika individernas insatser inom projektet. Vi har provat detta under fyra kursomgångar med som mest sju projekt. Examinationsformen är rolig och motiverande både för studenterna och lärarna.

  • 10. Granholm, V.
    et al.
    Kim, S.
    Fernandez Navarro, José
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Sjölund, E.
    Smith, R. D.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Fast and accurate database searches with MS-GF+percolator2014Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 13, nr 2, s. 890-897Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    One can interpret fragmentation spectra stemming from peptides in mass-spectrometry-based proteomics experiments using so-called database search engines. Frequently, one also runs post-processors such as Percolator to assess the confidence, infer unique peptides, and increase the number of identifications. A recent search engine, MS-GF+, has shown promising results, due to a new and efficient scoring algorithm. However, MS-GF+ provides few statistical estimates about the peptide-spectrum matches, hence limiting the biological interpretation. Here, we enabled Percolator processing for MS-GF+ output and observed an increased number of identified peptides for a wide variety of data sets. In addition, Percolator directly reports p values and false discovery rate estimates, such as q values and posterior error probabilities, for peptide-spectrum matches, peptides, and proteins, functions that are useful for the whole proteomics community.

  • 11. Granholm, Viktor
    et al.
    Fernandez Navarro, José
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Noble, William Stafford
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics2013Inngår i: Journal of Proteomics, ISSN 1874-3919, E-ISSN 1876-7737, Vol. 80, s. 123-131Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The analysis of a shotgun proteomics experiment results in a list of peptide-spectrum matches (PSMs) in which each fragmentation spectrum has been matched to a peptide in a database. Subsequently, most protein inference algorithms rank peptides according to the best-scoring PSM for each peptide. However, there is disagreement in the scientific literature on the best method to assess the statistical significance of the resulting peptide identifications. Here, we use a previously described calibration protocol to evaluate the accuracy of three different peptide-level statistical confidence estimation procedures: the classical Fisher's method, and two complementary procedures that estimate significance, respectively, before and after selecting the top-scoring PSM for each spectrum. Our experiments show that the latter method, which is employed by MaxQuant and Percolator, produces the most accurate, well-calibrated results.

  • 12. Granholm, Viktor
    et al.
    Käll, Lukas
    Quality assessments of peptide-spectrum matches in shotgun proteomics2011Inngår i: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 11, nr 6, s. 1086-1093Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The peptide identification process in shotgun proteomics is most frequently solved with search engines. Such search engines assign scores that reflect similarity between the measured fragmentation spectrum and the theoretical spectra of the peptides of a given database. However, the scores from most search engines do not have a direct statistical interpretation. To understand and make use of the significance of peptide identifications, one must thus be familiar with some statistical concepts. Here, we discuss different statistical scores used to show the confidence of an identification and a set of methods to estimate these scores. We also describe the variance of statistical scores and imperfections of scoring functions of peptide-spectrum matches.

  • 13. Granholm, Viktor
    et al.
    Noble, William Stafford
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    A cross-validation scheme for machine learning algorithms in shotgun proteomics2012Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 13, s. S3-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.

  • 14. Granholm, Viktor
    et al.
    Noble, William Stafford
    Käll, Lukas
    On using samples of known protein content to assess the statistical calibration of scores assigned to peptide-spectrum matches in shotgun proteomics2011Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 10, nr 5, s. 2671-2678Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In shotgun proteomics, the quality of a hypothesized match between an observed spectrum and a peptide sequence is quantified by a score function. Because the score function lies at the heart of any peptide identification pipeline, this function greatly affects the final results of a proteomics assay. Consequently, valid statistical methods for assessing the quality of a given score function are extremely important. Previously, several research groups have used samples of known protein composition to assess the quality of a given score function. We demonstrate that this approach is problematic, because the outcome can depend on factors other than the score function itself. We then propose an alternative use of the same type of data to validate a score function. The central idea of our approach is that database matches that are not explained by any protein in the purified sample comprise a robust representation of incorrect matches. We apply our alternative assessment scheme to several commonly used score functions, and we show that our approach generates a reproducible measure of the calibration of a given peptide identification method. Furthermore, we show how our quality test can be useful in the development of novel score functions.

  • 15.
    Griss, Johannes
    et al.
    Med Univ Vienna, Dept Dermatol, Div Immunol Allergy & Infect Dis, Wahringer Gurtel 18-20, A-1090 Vienna, Austria.;EBI, EMBL, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England..
    Perez-Riverol, Yasset
    EBI, EMBL, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England..
    The, Matthew
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Vizcaino, Juan Antonio
    EBI, EMBL, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England..
    Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra"2018Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 17, nr 5, s. 1993-1996Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.

  • 16.
    Halloran, John T.
    et al.
    Univ Calif Davis, Dept Publ Hlth Sci, Davis, CA 95616 USA..
    Zhang, Hantian
    Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland..
    Kara, Kaan
    Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland..
    Renggli, Cedric
    Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland..
    The, Matthew
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Zhang, Ce
    Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland..
    Rocke, David M.
    Univ Calif Davis, Dept Publ Hlth Sci, Davis, CA 95616 USA..
    Käll, Lukas
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Noble, William Stafford
    Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA.;Univ Washington, Paul Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA..
    Speeding Up Percolator2019Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 18, nr 9, s. 3353-3359Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The processing of peptide tandem mass spectrometry data involves matching observed spectra against a sequence database. The ranking and calibration of these peptide-spectrum matches can be improved substantially using a machine learning postprocessor. Here, we describe our efforts to speed up one widely used postprocessor, Percolator. The improved software is dramatically faster than the previous version of Percolator, even when using relatively few processors. We tested the new version of Percolator on a data set containing over 215 million spectra and recorded an overall reduction to 23% of the running time as compared to the unoptimized code. We also show that the memory footprint required by these speedups is modest relative to that of the original version of Percolator.

  • 17. Henricson, Anna
    et al.
    Käll, Lukas
    Sonnhammer, Erik L. L.
    A novel transmembrane topology of presenilin based on reconciling experimental and computational evidence2005Inngår i: The FEBS Journal, ISSN 1742-464X, E-ISSN 1742-4658, Vol. 272, nr 11, s. 2727-2733Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The transmembrane topology of presenilins is still the subject of debate despite many experimental topology studies using antibodies or gene fusions. The results from these studies are partly contradictory and consequently several topology models have been proposed. Studies of presenilin-interacting proteins have produced further contradiction, primarily regarding the location of the C-terminus. It is thus impossible to produce a topology model that agrees with all published data on presenilin. We have analyzed the presenilin topology through computational sequence analysis of the presenilin family and the homologous presenilin-like protein family. Members of these families are intramembrane-cleaving aspartyl proteases. Although the overall sequence homology between the two families is low, they share the conserved putative active site residues and the conserved 'PAL' motif. Therefore, the topology model for the presenilin-like proteins can give some clues about the presenilin topology. Here we propose a novel nine-transmembrane topology with the C-terminus in the extracytosolic space. This model has strong support from published data on gamma-secretase function and presenilin topology. Contrary to most presenilin topology models, we show that hydrophobic region X is probably a transmembrane segment. Consequently, the C-terminus would be located in the extracytosolic space. However, the last C-terminal amino acids are relatively hydrophobic and in conjunction with existing experimental data we cannot exclude the possibility that the extreme C-terminus could be buried within the gamma-secretase complex. This might explain the difficulties in obtaining consistent experimental evidence regarding the location of the C-terminal region of presenilin.

  • 18.
    Jahn, Michael
    et al.
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Proteinvetenskap, Systembiologi. KTH, Centra, Science for Life Laboratory, SciLifeLab. K.
    Vialas, Vital
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH). KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Karlsen, Jan
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Proteinvetenskap, Systembiologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Maddalo, Gianluca
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH). KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Edfors, Fredrik
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Proteinvetenskap. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Forsström, Björn
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Proteinvetenskap, Systembiologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Uhlén, Mathias
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Proteinvetenskap, Systembiologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Hudson, Elton P.
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Proteinvetenskap, Systembiologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Growth of Cyanobacteria Is Constrained by the Abundance of Light and Carbon Assimilation Proteins2018Inngår i: Cell reports, ISSN 2211-1247, E-ISSN 2211-1247, Vol. 25, nr 2, s. 478-+Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Cyanobacteria must balance separate demands for energy generation, carbon assimilation, and biomass synthesis. We used shotgun proteomics to investigate proteome allocation strategies in the model cyanobacterium Synechocystis sp. PCC 6803 as it adapted to light and inorganic carbon (C-i) limitation. When partitioning the proteome into seven functional sectors, we find that sector sizes change linearly with growth rate. The sector encompassing ribosomes is significantly smaller than in E. coli, which may explain the lower maximum growth rate in Synechocystis. Limitation of light dramatically affects multiple proteome sectors, whereas the effect of C-i limitation is weak. Carbon assimilation proteins respond more strongly to changes in light intensity than to C-i. A coarse-grained cell economy model generally explains proteome trends. However, deviations from model predictions suggest that the large proteome sectors for carbon and light assimilation are not optimally utilized under some growth conditions and may constrain the proteome space available to ribosomes.

  • 19.
    Jeuken, Gustavo S.
    et al.
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    A simple null model for inferences from network enrichment analysis2018Inngår i: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 13, nr 11, artikkel-id e0206864Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A prevailing technique to infer function from lists of identifications, from molecular biological high-throughput experiments, is over-representation analysis, where the identifications are compared to predefined sets of related genes often referred to as pathways. As at least some pathways are known to be incomplete in their annotation, algorithmic efforts have been made to complement them with information from functional association networks. While the terminology varies in the literature, we will here refer to such methods as Network Enrichment Analysis (NEA). Traditionally, the significance of inferences from NEA has been assigned using a null model constructed from randomizations of the network. Here we instead argue for a null model that more directly relates to the set of genes being studied, and have designed one dynamic programming algorithm that calculates the score distribution of NEA scores that makes it possible to assign unbiased mid p values to inferences. We also implemented a random sampling method, carrying out the same task. We demonstrate that our method obtains a superior statistical calibration as compared to the popular NEA inference engine, BinoX, while also providing statistics that are easier to interpret.

  • 20.
    Käll, Lukas
    Department of Biochemistry and Biophysics, Center for Biomembrane Research and Stockholm Bioinformatics Center, Stockholm University.
    Prediction of transmembrane topology and signal peptide given a protein's amino acid sequence2010Inngår i: Vol. 673, s. 53-62Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Here, we describe transmembrane topology and signal peptide predictors and highlight their advantages and shortcomings. We also discuss the relation between these two types of prediction.

  • 21.
    Käll, Lukas
    et al.
    Department of Genome Sciences, University of Washington.
    Canterbury, Jesse D.
    Weston, Jason
    Noble, William Stafford
    MacCoss, Michael J.
    Semi-supervised learning for peptide identification from shotgun proteomics datasets2007Inngår i: Nature Methods, ISSN 1548-7091, E-ISSN 1548-7105, Vol. 4, nr 11, s. 923-925Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Shotgun proteomics uses liquid chromatography-tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach.

  • 22.
    Käll, Lukas
    et al.
    Ctr. for Genomics and Bioinformatics, Karolinska Institutet.
    Krogh, Anders
    Sonnhammer, Erik L. L.
    A combined transmembrane topology and signal peptide prediction method2004Inngår i: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 338, nr 5, s. 1027-1036Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    An inherent problem in transmembrane protein topology prediction and signal peptide prediction is the high similarity between the hydrophobic regions of a transmembrane helix and that of a signal peptide, leading to cross-reaction between the two types of predictions. To improve predictions further, it is therefore important to make a predictor that aims to discriminate between the two classes. In addition, topology information can be gained when successfully predicting a signal Peptide leading a trans' membrane protein since it dictates that the N terminus of the mature protein must be on the non-cytoplasmic side of the membrane. Here, we present Phobius, a combined transmembrane protein topology and signal peptide predictor. The predictor is based on a hidden Markov model (HMM) that models the different sequence regions of a signal peptide and the different regions of a transmembrane protein in a series of interconnected states. Training was done on a newly assembled and curated dataset. Compared to TMHMM and SignalP, errors coming from cross-prediction between transmembrane segments and signal peptides were reduced substantially by Phobius. False classifications of signal peptides were reduced from 26.1% to 3.9% and false classifications of transmembrane helices were reduced from 19.0%, to 7.7%. Phobius was applied to the proteomes of Honzo sapiens and Escherichia coli. Here we also noted a drastic reduction of false classifications compared to TMHMM/SignalP, suggesting that Phobius is well suited for whole-genome annotation of signal peptides and transmembrane regions. The method is available at http://phobius.cgb.ki.se/ as well as at http://phobius.binf.ku.dk/.

  • 23.
    Käll, Lukas
    et al.
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Krogh, Anders
    Sonnhammer, Erik L. L.
    Advantages of combined transmembrane topology and signal peptide prediction - the Phobius web server2007Inngår i: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 35, nr Web Server issue, 1, s. W429-W432Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    When using conventional transmembrane topology and signal peptide predictors, such as TMHMM and SignalP, there is a substantial overlap between these two types of predictions. Applying these methods to five complete proteomes, we found that 30-65% of all predicted signal peptides and 25-35% of all predicted transmembrane topologies overlap. This impairs predictions of 5-10% of the proteome, hence this is an important issue in protein annotation. To address this problem, we previously designed a hidden Markov model, Phobius, that combines transmembrane topology and signal peptide predictions. The method makes an optimal choice between transmembrane segments and signal peptides, and also allows constrained and homology-enriched predictions. We here present a web interface (http://phobius.cgb.ki.se and http://phobius.binf.ku.dk) to access Phobius.

  • 24.
    Käll, Lukas
    et al.
    Center for Genomics and Bioinformatics, Karolinska Institutet.
    Krogh, Anders
    Sonnhammer, Erik L. L.
    An HMM posterior decoder for sequence feature prediction that includes homology information2005Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 21, nr Suppl.1, s. i251-i257Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Motivation: When predicting sequence features like transmembrane topology, signal peptides, coil-coil structures, protein secondary structure or genes, extra support can be gained from homologs. Results: We present here a general hidden Markov model (HMM) decoding algorithm that combines probabilities for sequence features of homologs by considering the average of the posterior label probability of each position in a global sequence alignment. The algorithm is an extension of the previously described 'optimal accuracy' decoder, allowing homology information to be used. It was benchmarked using an HMM for transmembrane topology and signal peptide prediction, Phobius. We found that the performance was substantially increased when incorporating information from homologs.

  • 25.
    Käll, Lukas
    et al.
    Ctr. for Genomics and Bioinformatics, Karolinska Institutet.
    Sonnhammer, Erik L. L.
    Reliability of transmembrane predictions in whole-genome data2002Inngår i: FEBS Letters, ISSN 0014-5793, E-ISSN 1873-3468, Vol. 532, nr 3, s. 415-418Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Transmembrane prediction methods are generally benchmarked on a set of proteins with experimentally verified topology. We have investigated if the accuracy measured on such datasets can be expected in an unbiased genomic analysis, or if there is a bias towards 'easily predictable' proteins in the benchmark datasets. As a measurement of accuracy, the concordance of the results from five different prediction methods was used (TMHMM, PHD, HMMTOP, MEMSAT, and TOPPRED). The benchmark dataset showed significantly higher levels (up to five times) of agreement between different methods than in 10 tested genomes. We have also analyzed which programs are most prone to make mispredictions by measuring the frequency of one-out-of-five disagreeing predictions.

  • 26. Käll, Lukas
    et al.
    Storey, John D.
    MacCoss, Michael J.
    Noble, William Stafford
    Assigning significance to peptides identified by tandem mass spectrometry using decoy databases2008Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 7, nr 1, s. 29-34Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Automated methods for assigning peptides to observed tandem mass spectra typically return a list of peptide-spectrum matches, ranked according to an arbitrary score. In this article, we describe methods for converting these arbitrary scores into more useful statistical significance measures. These methods employ a decoy sequence database as a model of the null hypothesis, and use false discovery rate (FDR) analysis to correct for multiple testing. We first describe a simple FDR inference method and then describe how estimating and taking into account the percentage of incorrectly identified spectra in the entire data set can lead to increased statistical power.

  • 27. Käll, Lukas
    et al.
    Storey, John D.
    MacCoss, Michael J.
    Noble, William Stafford
    Posterior error probabilities and false discovery rates: two sides of the same coin2008Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 7, nr 1, s. 40-44Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A variety of methods have been described in the literature for assigning statistical significance to peptides identified via tandem mass spectrometry. Here, we explain how two types of scores, the q-value and the posterior error probability, are related and complementary to one another.

  • 28. Käll, Lukas
    et al.
    Storey, John D.
    Noble, William Stafford
    Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry2008Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, nr 16, s. i42-i48Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Motivation: A mass spectrum produced via tandem mass spectrometry can be tentatively matched to a peptide sequence via database search. Here, we address the problem of assigning a posterior error probability (PEP) to a given peptide-spectrum match (PSM). This problem is considerably more difficult than the related problem of estimating the error rate associated with a large collection of PSMs. Existing methods for estimating PEPs rely on a parametric or semiparametric model of the underlying score distribution. Results: We demonstrate how to apply non-parametric logistic regression to this problem. The method makes no explicit assumptions about the form of the underlying score distribution; instead, the method relies upon decoy PSMs, produced by searching the spectra against a decoy sequence database, to provide a model of the null score distribution. We show that our non-parametric logistic regression method produces accurate PEP estimates for six different commonly used PSM score functions. In particular, the estimates produced by our method are comparable in accuracy to those of PeptideProphet, which uses a parametric or semiparametric model designed specifically to work with SEQUEST. The advantage of the non-parametric approach is applicability and robustness to new score functions and new types of data.

  • 29. Käll, Lukas
    et al.
    Storey, John D.
    Noble, William Stafford
    QVALITY: non-parametric estimation of q-values and posterior error probabilities2009Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, nr 7, s. 964-966Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Qvality is a C++ program for estimating two types of standard statistical confidence measures: the q-value, which is an analog of the p-value that incorporates multiple testing correction, and the posterior error probability (PEP, also known as the local false discovery rate), which corresponds to the probability that a given observation is drawn from the null distribution. In computing q-values, qvality employs a standard bootstrap procedure to estimate the prior probability of a score being from the null distribution; for PEP estimation, qvality relies upon non-parametric logistic regression. Relative to other tools for estimating statistical confidence measures, qvality is unique in its ability to estimate both types of scores directly from a null distribution, without requiring the user to calculate p-values.

  • 30.
    Käll, Lukas
    et al.
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Vitek, Olga
    Computational Mass Spectrometry-Based Proteomics2011Inngår i: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 7, nr 12, s. e1002277-Artikkel i tidsskrift (Fagfellevurdert)
  • 31. Lee, J. -Y
    et al.
    Choi, H.
    Colangelo, C. M.
    Davis, D.
    Hoopmann, M. R.
    Käll, Lukas
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Lam, H.
    Payne, S. H.
    Perez-Riverol, Y.
    The, Matthew
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Wilson, R.
    Weintraub, S. T.
    Palmblad, M.
    ABRF Proteome Informatics Research Group (iPRG) 2016 Study: Inferring Proteoforms from Bottom-up Proteomics Data2018Inngår i: Journal of biomolecular techniques : JBT, ISSN 1943-4731, Vol. 29, nr 2, s. 39-45Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This report presents the results from the 2016 Association of Biomolecular Resource Facilities Proteome Informatics Research Group (iPRG) study on proteoform inference and false discovery rate (FDR) estimation from bottom-up proteomics data. For this study, 3 replicate Q Exactive Orbitrap liquid chromatography-tandom mass spectrometry datasets were generated from each of 4 Escherichia coli samples spiked with different equimolar mixtures of small recombinant proteins selected to mimic pairs of homologous proteins. Participants were given raw data and a sequence file and asked to identify the proteins and provide estimates on the FDR at the proteoform level. As part of this study, we tested a new submission system with a format validator running on a virtual private server (VPS) and allowed methods to be provided as executable R Markdown or IPython Notebooks. The task was perceived as difficult, and only eight unique submissions were received, although those who participated did well with no one method performing best on all samples. However, none of the submissions included a complete Markdown or Notebook, even though examples were provided. Future iPRG studies need to be more successful in promoting and encouraging participation. The VPS and submission validator easily scale to much larger numbers of participants in these types of studies. The unique "ground-truth" dataset for proteoform identification generated for this study is now available to the research community, as are the server-side scripts for validating and managing submissions.

  • 32. Lehtio, J.
    et al.
    Branca, M.
    Johansson, H.
    Orre, M.
    Granholm, Viktor
    KTH.
    Forshed, J.
    Perez-Bercoff, M.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Genome Wide Proteomics Using Peptide High Resolution Isoelectric Focusing Hirief-Ms Allows Detection New Human Gene Models2012Inngår i: Annals of Oncology, ISSN 0923-7534, E-ISSN 1569-8041, Vol. 23, s. 33-34Artikkel i tidsskrift (Annet vitenskapelig)
  • 33. Lundin, Carolina
    et al.
    Käll, Lukas
    Stockholm Bioinformatics Center, AlbaNova.
    Kreher, Scott A.
    Kapp, Katja
    Sonnhammer, Erik L.
    Carlson, John R.
    Heijne, Gunnar von
    Nilsson, IngMarie
    Membrane topology of the Drosophila OR83b odorant receptor2007Inngår i: FEBS Letters, ISSN 0014-5793, E-ISSN 1873-3468, Vol. 581, nr 29, s. 5601-5604Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    By analogy to mammals, odorant receptors (ORs) in insects, such as Drosophila melanogaster, have long been thought to belong to the G-protein coupled receptor (GPCR) superfamily. However, recent work has cast doubt on this assumption and has tentatively suggested an inverted topology compared to the canonical N(out) - C(in) 7 transmembrane (TM) GPCR topology, at least for some Drosophila ORs. Here, we report a detailed topology mapping of the Drosophila OR83b receptor using engineered glycosylation sites as topology markers. Our results are inconsistent with a classical GPCR topology and show that OR83b has an intracellular N-terminus, an extracellular C-terminus, and 7TM helices.

  • 34. McIlwain, Sean
    et al.
    Tamura, Kaipo
    Kertesz-Farkas, Attila
    Grant, Charles E.
    Diament, Benjamin
    Frewen, Barbara
    Howbert, J. Jeffry
    Hoopmann, Michael R.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Eng, Jimmy K.
    MacCoss, Michael J.
    Noble, William Stafford
    Crux: Rapid Open Source Protein Tandem Mass Spectrometry Analysis2014Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 13, nr 10, s. 4488-4491Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Efficiently and accurately analyzing big protein tandem mass spectrometry data sets requires robust software that incorporates state-of-the-art computational, machine learning, and statistical methods. The Crux mass spectrometry analysis software toolkit (http://cruxtoolkit.sourceforge.net) is an open source project that aims to provide users with a cross-platform suite of analysis tools for interpreting protein mass spectrometry data.

  • 35. Merrihew, Gennifer E.
    et al.
    Davis, Colleen
    Ewing, Brent
    Williams, Gary
    Käll, Lukas
    University of Washington, Department of Genome Sciences.
    Frewen, Barbara E.
    Noble, William Stafford
    Green, Phil
    Thomas, James H.
    MacCoss, Michael J.
    Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations2008Inngår i: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 18, nr 10, s. 1660-1669Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We describe a general mass spectrometry-based approach for gene annotation of any organism and demonstrate its effectiveness using the nematode Caenorhabditis elegans. We detected 6779 C. elegans proteins (67,047 peptides), including 384 that, although annotated in WormBase WS150, lacked cDNA or other prior experimental support. We also identified 429 new coding sequences that were unannotated in WS150. Nearly half (192/429) of the new coding sequences were confirmed with RT-PCR data. Thirty-three (approximately 8%) of the new coding sequences had been predicted to be pseudogenes, 151 (approximately 35%) reveal apparent errors in gene models, and 245 (57%) appear to be novel genes. In addition, we verified 6010 exon-exon splice junctions within existing WormBase gene models. Our work confirms that mass spectrometry is a powerful experimental tool for annotating sequenced genomes. In addition, the collection of identified peptides should facilitate future proteomics experiments targeted at specific proteins of interest.

  • 36. Moruz, Luminita
    et al.
    Hoopmann, Michael R.
    Rosenlund, Magnus
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Granholm, Viktor
    Moritz, Robert L.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Mass Fingerprinting of Complex Mixtures: Protein Inference from High-Resolution Peptide Masses and Predicted Retention Times2013Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 12, nr 12, s. 5730-5741Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In typical shotgun experiments, the mass spectrometer records the masses of a large set of ionized analytes but fragments only a fraction of them. In the subsequent analyses, normally only the fragmented ions are used to compile a set of peptide identifications, while the unfragmented ones are disregarded. In this work, we show how the unfragmented ions, here denoted MS1-features, can be used to increase the confidence of the proteins identified in shotgun experiments. Specifically, we propose the usage of in silico mass tags, where the observed MS1-features are matched against de novo predicted masses and retention times for all peptides derived from a sequence database. We present a statistical model to assign protein-level probabilities based on the MS1-features and combine this data with the fragmentation spectra. Our approach was evaluated for two triplicate data sets from yeast and human, respectively, leading to up to 7% more protein identifications at a fixed protein-level false discovery rate of 1%. The additional protein identifications were validated both in the context of the mass spectrometry data and by examining their estimated transcript levels generated using RNA-Seq. The proposed method is reproducible, straightforward to apply, and can even be used to reanalyze and increase the yield of existing data sets.

  • 37. Moruz, Luminita
    et al.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    GradientOptimizer: An open-source graphical environment for calculating optimized gradients in reversed-phase liquid chromatography2014Inngår i: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 14, nr 12, s. 1464-1466Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We here present GradientOptimizer, an intuitive, lightweight graphical user interface to design nonlinear gradients for separation of peptides by reversed-phase liquid chromatography. The software allows to calculate three types of nonlinear gradients, each of them optimizing a certain retention time distribution of interest. GradientOptimizer is straightforward to use, requires minimum processing of the input files, and is supported under Windows, Linux, and OS X platforms. The software is open-source and can be downloaded under an Apache 2.0 license at https://github.com/statisticalbiotechnology/NonlinearGradientsUI.

  • 38.
    Moruz, Luminita
    et al.
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Peptide retention time prediction2016Inngår i: Mass spectrometry reviews (Print), ISSN 0277-7037, E-ISSN 1098-2787Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Most methods for interpreting data from shotgun proteomics experiments are to large degree dependent on being able to predict properties of peptide-ions. Often such predicted properties are limited to molecular mass and fragment spectra, but here we put focus on a perhaps underutilized property, a peptide's chromatographic retention time. We review a couple of different principles of retention time prediction,and their applications within computational proteomics.

  • 39. Moruz, Luminita
    et al.
    Pichler, Peter
    Stranzl, Thomas
    Mechtler, Karl
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Optimized Nonlinear Gradients for Reversed-Phase Liquid Chromatography in Shotgun Proteomics2013Inngår i: Analytical Chemistry, ISSN 0003-2700, E-ISSN 1520-6882, Vol. 85, nr 16, s. 7777-7785Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Reversed-phase liquid chromatography has become the preferred method for separating peptides in most of the mass spectrometry-based proteomics workflows of today. In the way the technique is typically applied, the peptides are released from the chromatography column by the gradual addition of an organic buffer according to a linear function. However, when applied to complex peptide mixtures, this approach leads to unequal spreads of the peptides over the chromatography time. To address this, we investigated the use of nonlinear gradients, customized for each setup at hand. We developed an algorithm to generate optimized gradient functions for shotgun proteomics experiments and evaluated it for two data sets consisting each of four replicate runs of a human complex sample. Our results show that the optimized gradients produce a more even spread of the peptides over the chromatography run, while leading to increased numbers of confident peptide identifications. In addition, the list of peptides identified using nonlinear gradients differed considerably from those found with the linear ones, suggesting that such gradients can be a valuable tool for increasing the proteome coverage of mass spectrometry-based experiments.

  • 40. Moruz, Luminita
    et al.
    Staes, An
    Foster, Joseph M.
    Hatzou, Maria
    Timmerman, Evy
    Martens, Lennart
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Chromatographic retention time prediction for posttranslationally modified peptides2012Inngår i: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 12, nr 8, s. 1151-1159Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Retention time prediction of peptides in liquid chromatography has proven to be a valuable tool for mass spectrometry-based proteomics, especially in designing more efficient procedures for state-of-the-art targeted workflows. Additionally, accurate retention time predictions can also be used to increase confidence in identifications in shotgun experiments. Despite these obvious benefits, the use of such methods has so far not been extended to (posttranslationally) modified peptides due to the absence of efficient predictors for such peptides. We here therefore describe a new retention time predictor for modified peptides, built on the foundations of our existing Elude algorithm. We evaluated our software by applying it on five types of commonly encountered modifications. Our results show that Elude now yields equally good prediction performances for modified and unmodified peptides, with correlation coefficients between predicted and observed retention times ranging from 0.93 to 0.98 for all the investigated datasets. Furthermore, we show that our predictor handles peptides carrying multiple modifications as well. This latest version of Elude is fully portable to new chromatographic conditions and can readily be applied to other types of posttranslational modifications. Elude is available under the permissive Apache2 open source License at or can be run via a web-interface at.

  • 41. Moruz, Luminita
    et al.
    Tomazela, Daniela
    Käll, Lukas
    Center for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University; Department of Genome Sciences, University of Washington, Seattle.
    Training, selection, and robust calibration of retention time models for targeted proteomics2010Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 9, nr 10, s. 5209-5216Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Accurate predictions of peptide retention times (RT) in liquid chromatography have many applications in mass spectrometry-based proteomics. Most notably such predictions are used to weed out incorrect peptide-spectrum matches, and to design targeted proteomics experiments. In this study, we describe a RT predictor, ELUDE, which can be employed in both applications. ELUDE's predictions are based on 60 features derived from the peptide's amino acid composition and optimally combined using kernel regression. When sufficient data is available, ELUDE derives a retention time index for the condition at hand making it fully portable to new chromatographic conditions. In cases when little training data is available, as often is the case in targeted proteomics experiments, ELUDE selects and calibrates a model from a library of pretrained predictors. Both model selection and calibration are carried out via robust statistical methods and thus ELUDE can handle situations where the calibration data contains erroneous data points. We benchmarked our method against two state-of-the-art predictors and showed that ELUDE outperforms these methods and tracked up to 34% more peptides in a theoretical SRM method creation experiment. ELUDE is freely available under Apache License from http://per-colator.com.

  • 42. Park, Christopher Y.
    et al.
    Klammer, Aaron A.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    MacCoss, Michael J.
    Noble, William S.
    Rapid and accurate peptide identification from tandem mass spectra2008Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 7, nr 7, s. 3022-3027Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Mass spectrometry, the core technology in the field of proteomics, promises to enable scientists to identify and quantify the entire complement of proteins in a complex biological sample. Currently, the primary bottleneck in this type of experiment is computational. Existing algorithms for interpreting mass spectra are slow and fail to identify a large proportion of the given spectra. We describe a database search program called Crux that reimplements and extends the widely used database search program Sequest. For speed, Crux uses a peptide indexing scheme to rapidly retrieve candidate peptides for a given spectrum. For each peptide in the target database, Crux generates shuffled decoy peptides on the fly, providing a good null model and, hence, accurate false discovery rate estimates. Crux also implements two recently described postprocessing methods: a p value calculation based upon fitting a Weibull distribution to the observed scores, and a semisupervised method that learns to discriminate between target and decoy matches. Both methods significantly improve the overall rate of peptide identification. Crux is implemented in C and is distributed with source code freely to noncommercial users.

  • 43. Reynolds, Sheila M.
    et al.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Riffle, Michael E.
    Bilmes, Jeff A.
    Noble, William Stafford
    Transmembrane topology and signal peptide prediction using dynamic bayesian networks2008Inngår i: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 4, nr 11, s. e1000213-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by a previously published HMM, Phobius, and combines a signal peptide submodel with a transmembrane submodel. We introduce a two-stage DBN decoder that combines the power of posterior decoding with the grammar constraints of Viterbi-style decoding. Philius also provides protein type, segment, and topology confidence metrics to aid in the interpretation of the predictions. We report a relative improvement of 13% over Phobius in full-topology prediction accuracy on transmembrane proteins, and a sensitivity and specificity of 0.96 in detecting signal peptides. We also show that our confidence metrics correlate well with the observed precision. In addition, we have made predictions on all 6.3 million proteins in the Yeast Resource Center (YRC) database. This large-scale study provides an overall picture of the relative numbers of proteins that include a signal-peptide and/or one or more transmembrane segments as well as a valuable resource for the scientific community. All DBNs are implemented using the Graphical Models Toolkit. Source code for the models described here is available at http://noble.gs.washington.edu/proj/philius. A Philius Web server is available at http://www.yeastrc.org/philius, and the predictions on the YRC database are available at http://www.yeastrc.org/pdr.

  • 44. Serang, O.
    et al.
    Cansizoglu, A. E.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Steen, H.
    Steen, J. A.
    Nonparametric bayesian evaluation of differential protein quantification2013Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 12, nr 10, s. 4556-4565Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Arbitrary cutoffs are ubiquitous in quantitative computational proteomics: maximum acceptable MS/MS PSM or peptide q value, minimum ion intensity to calculate a fold change, the minimum number of peptides that must be available to trust the estimated protein fold change (or the minimum number of PSMs that must be available to trust the estimated peptide fold change), and the "significant" fold change cutoff. Here we introduce a novel experimental setup and nonparametric Bayesian algorithm for determining the statistical quality of a proposed differential set of proteins or peptides. By comparing putatively nonchanging case-control evidence to an empirical null distribution derived from a control-control experiment, we successfully avoid some of these common parameters. We then apply our method to evaluating different fold-change rules and find that for our data a 1.2-fold change is the most permissive of the plausible fold-change rules.

  • 45. Serang, Oliver
    et al.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Solution to Statistical Challenges in Proteomics Is More Statistics, Not Less2015Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 14, nr 10, s. 4099-4103Artikkel i tidsskrift (Annet vitenskapelig)
    Abstract [en]

    In any high-throughput scientific study, it is often essential to estimate the percent of findings that are actually incorrect. This percentage is called the false discovery rate (abbreviated "FDR"), and it is an invariant (albeit, often unknown) quantity for any well-formed study. In proteomics, it has become common practice to incorrectly conflate the protein FDR (the percent of identified proteins that are actually absent) with protein-level target-decoy, a particular method for estimating the protein-level FDR. In this manner, the challenges of one approach have been used as the basis for an argument that the field should abstain from protein-level FDR analysis altogether or even the suggestion that the very notion of a protein FDR is flawed. As we demonstrate in simple but accurate simulations, not only is the protein-level FDR an invariant concept, when analyzing large data sets, the failure to properly acknowledge it or to correct for multiple testing can result in large, unrecognized errors, whereby thousands of absent proteins (and, potentially every protein in the FASTA database being considered) can be incorrectly identified.

  • 46. Serang, Oliver
    et al.
    Moruz, Luminita
    Hoopmann, Michael R.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Recognizing Uncertainty Increases Robustness and Reproducibility of Mass Spectrometry-based Protein Inferences2012Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 11, nr 12, s. 5586-5591Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Parsimony and protein grouping are widely employed to enforce economy in the number of identified proteins, with the goal of increasing the quality and reliability of protein identifications; however, in a counterintuitive manner, parsimony and protein grouping may actually decrease the reproducibility and interpretability of protein identifications. We present a simple illustration demonstrating ways in which parsimony and protein grouping may lower the reproducibility or interpretability of results. We then provide an example of a data set where a probabilistic method increases the reproducibility and interpretability of identifications made on replicate analyses of Human Du145 prostate cancer cell lines.

  • 47. Spivak, Marina
    et al.
    Weston, Jason
    Bottou, Léon
    Käll, Lukas
    Noble, William Stafford
    Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets2009Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 8, nr 7, s. 3737-3745Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Shotgun proteomics coupled with database search software allows the identification of a large number of peptides in a single experiment. However, some existing search algorithms, such as SEQUEST, use score functions that are designed primarily to identify the best peptide for a given spectrum. Consequently, when comparing identifications across spectra, the SEQUEST score function Xcorr fails to discriminate accurately between correct and incorrect peptide identifications. Several machine learning methods have been proposed to address the resulting classification task of distinguishing between correct and incorrect peptide-spectrum matches (PSMs). A recent example is Percolator, which uses semisupervised learning and a decoy database search strategy to learn to distinguish between correct and incorrect PSMs identified by a database search algorithm. The current work describes three improvements to Percolator. (1) Percolator's heuristic optimization is replaced with a clear objective function, with intuitive reasons behind its choice. (2) Tractable nonlinear models are used instead of linear models, leading to improved accuracy over the original Percolator. (3) A method, Q-ranker, for directly optimizing the number of identified spectra at a specified q value is proposed, which achieves further gains.

  • 48.
    The, Matthew
    et al.
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Edfors, Fredrik
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Proteinvetenskap, Systembiologi.
    Perez-Riverol, Yasset
    EBI, EMBL, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England..
    Payne, Samuel H.
    Pacific Northwest Natl Lab, Biol Sci Div, Richland, WA 99352 USA..
    Hoopmann, Michael R.
    Inst Syst Biol, Seattle, WA 98109 USA..
    Palmblad, Magnus
    Leiden Univ, Med Ctr, Ctr Prote & Metabol, NL-2300 RC Leiden, Netherlands..
    Forsström, Björn
    KTH, Skolan för bioteknologi (BIO), Centra, Albanova VinnExcellence Center for Protein Technology, ProNova. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms2018Inngår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 17, nr 5, s. 1879-1886Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A natural way to benchmark the performance of an analytical experimental setup is to use samples of known measured analytes are peptides and not the actual proteins one of the inherent problems of interpreting data is that the composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.

  • 49.
    The, Matthew
    et al.
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    Käll, Lukas
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    Distillation of label-free quantification data by clustering and Bayesian modelingManuskript (preprint) (Annet vitenskapelig)
    Abstract [en]

    In shotgun proteomics, the amount of information that can be extracted from label-free quantification experiments is typically limited by the identification rate as well as the noise level of the quantitative signals. This generally causes a low sensitivity in differential expression analysis on protein level. Here, we present a new method, MaRaQuant, in which we reverse the typical identification-first workflow into a quantification-first approach. Specifically, we apply unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. This ensures that no valuable information is discarded due to analytes missing identification thresholds and allows us to spend more effort on the identification process due to the data reduction achieved by clustering. Furthermore, we propagate error probabilities from feature level all the way to protein level and input these to our probabilistic protein quantification method, Triqler. Applying this methodology to an engineered dataset, we managed to identify multiple analytes of interest that would have gone unnoticed in traditional pipelines, specifically, through the use of open modification and de novo searches. MaRaQuant/Triqler obtains significantly more identifications on all levels compared to MaxQuant/Perseus, including differentially expressed proteins. Notably, we managed to identify differentially expressed proteins in a clinical dataset where previously none were discovered. Furthermore, our differentially expressed proteins allowed us to attribute multiple functional annotation terms to both clinical datasets that we investigated.

  • 50.
    The, Matthew
    et al.
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    Käll, Lukas
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    Integrated identification and quantification error probabilities for shotgun proteomicsManuskript (preprint) (Annet vitenskapelig)
    Abstract [en]

    Protein quantification by label-free shotgun proteomics experiments is plagued by a multitude of error sources. Typical pipelines for identifying differentially expressed proteins use intermediate filters in an attempt to control the error rate. However, they often ignore certain error sources and, moreover, regard filtered lists as completely correct in subsequent steps. These two indiscretions can easily lead to a loss of control of the false discovery rate (FDR). We propose a probabilistic graphical model, Triqler, that propagates error information through all steps, employing distributions in favor of point estimates, most notably for missing value imputation. The model outputs posterior probabilities for fold changes between treatment groups, highlighting uncertainty rather than hiding it. We analyzed 3 engineered datasets and achieved FDR control and high sensitivity, even for truly absent proteins. In a bladder cancer clinical dataset we discovered 35 proteins at 5% FDR, with the original study discovering none at this threshold. Compellingly, these proteins showed enrichment for functional annotation terms. The model executes in minutes and is freely available at https://pypi.org/project/triqler/.

12 1 - 50 of 63
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf