Ändra sökning
Avgränsa sökresultatet
1 - 15 av 15
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Afkham, Heydar Maboudi
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    Qiu, Xuanbin
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    The, Matthew
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics2017Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, nr 4, s. 508-513Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Motivation: Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time. Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor ELUDE. Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction. Results: In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies.

  • 2.
    Griss, Johannes
    et al.
    Med Univ Vienna, Dept Dermatol, Div Immunol Allergy & Infect Dis, Wahringer Gurtel 18-20, A-1090 Vienna, Austria.;EBI, EMBL, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England..
    Perez-Riverol, Yasset
    EBI, EMBL, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England..
    The, Matthew
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Vizcaino, Juan Antonio
    EBI, EMBL, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England..
    Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra"2018Ingår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 17, nr 5, s. 1993-1996Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.

  • 3.
    Halloran, John T.
    et al.
    Univ Calif Davis, Dept Publ Hlth Sci, Davis, CA 95616 USA..
    Zhang, Hantian
    Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland..
    Kara, Kaan
    Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland..
    Renggli, Cedric
    Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland..
    The, Matthew
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Zhang, Ce
    Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland..
    Rocke, David M.
    Univ Calif Davis, Dept Publ Hlth Sci, Davis, CA 95616 USA..
    Käll, Lukas
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Noble, William Stafford
    Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA.;Univ Washington, Paul Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA..
    Speeding Up Percolator2019Ingår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 18, nr 9, s. 3353-3359Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The processing of peptide tandem mass spectrometry data involves matching observed spectra against a sequence database. The ranking and calibration of these peptide-spectrum matches can be improved substantially using a machine learning postprocessor. Here, we describe our efforts to speed up one widely used postprocessor, Percolator. The improved software is dramatically faster than the previous version of Percolator, even when using relatively few processors. We tested the new version of Percolator on a data set containing over 215 million spectra and recorded an overall reduction to 23% of the running time as compared to the unoptimized code. We also show that the memory footprint required by these speedups is modest relative to that of the original version of Percolator.

  • 4. Lee, J. -Y
    et al.
    Choi, H.
    Colangelo, C. M.
    Davis, D.
    Hoopmann, M. R.
    Käll, Lukas
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Lam, H.
    Payne, S. H.
    Perez-Riverol, Y.
    The, Matthew
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Wilson, R.
    Weintraub, S. T.
    Palmblad, M.
    ABRF Proteome Informatics Research Group (iPRG) 2016 Study: Inferring Proteoforms from Bottom-up Proteomics Data2018Ingår i: Journal of biomolecular techniques : JBT, ISSN 1943-4731, Vol. 29, nr 2, s. 39-45Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This report presents the results from the 2016 Association of Biomolecular Resource Facilities Proteome Informatics Research Group (iPRG) study on proteoform inference and false discovery rate (FDR) estimation from bottom-up proteomics data. For this study, 3 replicate Q Exactive Orbitrap liquid chromatography-tandom mass spectrometry datasets were generated from each of 4 Escherichia coli samples spiked with different equimolar mixtures of small recombinant proteins selected to mimic pairs of homologous proteins. Participants were given raw data and a sequence file and asked to identify the proteins and provide estimates on the FDR at the proteoform level. As part of this study, we tested a new submission system with a format validator running on a virtual private server (VPS) and allowed methods to be provided as executable R Markdown or IPython Notebooks. The task was perceived as difficult, and only eight unique submissions were received, although those who participated did well with no one method performing best on all samples. However, none of the submissions included a complete Markdown or Notebook, even though examples were provided. Future iPRG studies need to be more successful in promoting and encouraging participation. The VPS and submission validator easily scale to much larger numbers of participants in these types of studies. The unique "ground-truth" dataset for proteoform identification generated for this study is now available to the research community, as are the server-side scripts for validating and managing submissions.

  • 5.
    The, Matthew
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Statistical and machine learning methods to analyze large-scale mass spectrometry data2016Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
    Abstract [en]

    As in many other fields, biology is faced with enormous amounts ofdata that contains valuable information that is yet to be extracted. The field of proteomics, the study of proteins, has the luxury of having large repositories containing data from tandem mass-spectrometry experiments, readily accessible for everyone who is interested. At the same time, there is still a lot to discover about proteins as the main actors in cell processes and cell signaling.

    In this thesis, we explore several methods to extract more information from the available data using methods from statistics and machine learning. In particular, we introduce MaRaCluster, a new method for clustering mass spectra on large-scale datasets. This method uses statistical methods to assess similarity between mass spectra, followed by the conservative complete-linkage clustering algorithm.The combination of these two resulted in up to 40% more peptide identifications on its consensus spectra compared to the state of the art method.

    Second, we attempt to clarify and promote protein-level false discovery rates (FDRs). Frequently, studies fail to report protein-level FDRs even though the proteins are actually the entities of interest. We provided a framework in which to discuss protein-level FDRs in a systematic manner to open up the discussion and take away potential hesitance. We also benchmarked some scalable protein inference methods and included the best one in the Percolator package. Furthermore, we added functionality to the Percolator package to accommodate the analysis of studies in which many runs are aggregated. This reduced the run time for a recent study regarding a draft human proteome from almost a full day to just 10 minutes on a commodity computer, resulting in a list of proteins together with their corresponding protein-level FDRs.

  • 6.
    The, Matthew
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Statistical and machine learning methods to analyze large-scale mass spectrometry data2018Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
    Abstract [en]

    Modern biology is faced with vast amounts of data that contain valuable information yet to be extracted. Proteomics, the study of proteins, has repositories with thousands of mass spectrometry experiments. These data gold mines could further our knowledge of proteins as the main actors in cell processes and signaling. Here, we explore methods to extract more information from this data using statistical and machine learning methods.

    First, we present advances for studies that aggregate hundreds of runs. We introduce MaRaCluster, which clusters mass spectra for large-scale datasets using statistical methods to assess similarity of spectra. It identified up to 40% more peptides than the state-of-the-art method, MS-Cluster. Further, we accommodated large-scale data analysis in Percolator, a popular post-processing tool for mass spectrometry data. This reduced the runtime for a draft human proteome study from a full day to 10 minutes.

    Second, we clarify and promote the contentious topic of protein false discovery rates (FDRs). Often, studies report lists of proteins but fail to report protein FDRs. We provide a framework to systematically discuss protein FDRs and take away hesitance. We also added protein FDRs to Percolator, opting for the best-peptide approach which proved superior in a benchmark of scalable protein inference methods.

    Third, we tackle the low sensitivity of protein quantification methods. Current methods lack proper control of error sources and propagation. To remedy this, we developed Triqler, which controls the protein quantification FDR through a Bayesian framework. We also introduce MaRaQuant, which proposes a quantification-first approach that applies clustering prior to identification. This reduced the number of spectra to be searched and allowed us to spot unidentified analytes of interest. Combining these tools outperformed the state-of-the-art method, MaxQuant/Perseus, and found enriched functional terms for datasets that had none before.

  • 7.
    The, Matthew
    et al.
    KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Edfors, Fredrik
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Proteinvetenskap, Systembiologi.
    Perez-Riverol, Yasset
    EBI, EMBL, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England..
    Payne, Samuel H.
    Pacific Northwest Natl Lab, Biol Sci Div, Richland, WA 99352 USA..
    Hoopmann, Michael R.
    Inst Syst Biol, Seattle, WA 98109 USA..
    Palmblad, Magnus
    Leiden Univ, Med Ctr, Ctr Prote & Metabol, NL-2300 RC Leiden, Netherlands..
    Forsström, Björn
    KTH, Skolan för bioteknologi (BIO), Centra, Albanova VinnExcellence Center for Protein Technology, ProNova. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms2018Ingår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 17, nr 5, s. 1879-1886Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A natural way to benchmark the performance of an analytical experimental setup is to use samples of known measured analytes are peptides and not the actual proteins one of the inherent problems of interpreting data is that the composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.

  • 8.
    The, Matthew
    et al.
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    Käll, Lukas
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    Distillation of label-free quantification data by clustering and Bayesian modelingManuskript (preprint) (Övrigt vetenskapligt)
    Abstract [en]

    In shotgun proteomics, the amount of information that can be extracted from label-free quantification experiments is typically limited by the identification rate as well as the noise level of the quantitative signals. This generally causes a low sensitivity in differential expression analysis on protein level. Here, we present a new method, MaRaQuant, in which we reverse the typical identification-first workflow into a quantification-first approach. Specifically, we apply unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. This ensures that no valuable information is discarded due to analytes missing identification thresholds and allows us to spend more effort on the identification process due to the data reduction achieved by clustering. Furthermore, we propagate error probabilities from feature level all the way to protein level and input these to our probabilistic protein quantification method, Triqler. Applying this methodology to an engineered dataset, we managed to identify multiple analytes of interest that would have gone unnoticed in traditional pipelines, specifically, through the use of open modification and de novo searches. MaRaQuant/Triqler obtains significantly more identifications on all levels compared to MaxQuant/Perseus, including differentially expressed proteins. Notably, we managed to identify differentially expressed proteins in a clinical dataset where previously none were discovered. Furthermore, our differentially expressed proteins allowed us to attribute multiple functional annotation terms to both clinical datasets that we investigated.

  • 9.
    The, Matthew
    et al.
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    Käll, Lukas
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Genteknologi.
    Integrated identification and quantification error probabilities for shotgun proteomicsManuskript (preprint) (Övrigt vetenskapligt)
    Abstract [en]

    Protein quantification by label-free shotgun proteomics experiments is plagued by a multitude of error sources. Typical pipelines for identifying differentially expressed proteins use intermediate filters in an attempt to control the error rate. However, they often ignore certain error sources and, moreover, regard filtered lists as completely correct in subsequent steps. These two indiscretions can easily lead to a loss of control of the false discovery rate (FDR). We propose a probabilistic graphical model, Triqler, that propagates error information through all steps, employing distributions in favor of point estimates, most notably for missing value imputation. The model outputs posterior probabilities for fold changes between treatment groups, highlighting uncertainty rather than hiding it. We analyzed 3 engineered datasets and achieved FDR control and high sensitivity, even for truly absent proteins. In a bladder cancer clinical dataset we discovered 35 proteins at 5% FDR, with the original study discovering none at this threshold. Compellingly, these proteins showed enrichment for functional annotation terms. The model executes in minutes and is freely available at https://pypi.org/project/triqler/.

  • 10.
    The, Matthew
    et al.
    KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Skolan för kemi, bioteknologi och hälsa (CBH).
    Käll, Lukas
    KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Skolan för kemi, bioteknologi och hälsa (CBH).
    Integrated Identification and Quantification Error Probabilities for Shotgun Proteomics2019Ingår i: Molecular & Cellular Proteomics, ISSN 1535-9476, E-ISSN 1535-9484, Vol. 18, nr 3, s. 561-570Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Protein quantification by label-free shotgun proteomics experiments is plagued by a multitude of error sources. Typical pipelines for identifying differential proteins use intermediate filters to control the error rate. However, they often ignore certain error sources and, moreover, regard filtered lists as completely correct in subsequent steps. These two indiscretions can easily lead to a loss of control of the false discovery rate (FDR). We propose a probabilistic graphical model, Triqler, that propagates error information through all steps, employing distributions in favor of point estimates, most notably for missing value imputation. The model outputs posterior probabilities for fold changes between treatment groups, highlighting uncertainty rather than hiding it. We analyzed 3 engineered data sets and achieved FDR control and high sensitivity, even for truly absent proteins. In a bladder cancer clinical data set we discovered 35 proteins at 5% FDR, whereas the original study discovered 1 and MaxQuant/Perseus 4 proteins at this threshold. Compellingly, these 35 proteins showed enrichment for functional annotation terms, whereas the top ranked proteins reported by MaxQuant/Perseus showed no enrichment. The model executes in minutes and is freely available at https://pypi.org/project/triqler/.

  • 11.
    The, Matthew
    et al.
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    MaRaCluster: A Fragment Rarity Metric for Clustering Fragment Spectra in Shotgun Proteomics2016Ingår i: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 15, nr 3, s. 713-720Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Shotgun proteomics experiments generate large amounts of fragment spectra as primary data, normally with high redundancy between and within experiments. Here, we have devised a clustering technique to identify fragment spectra stemming from the same species of peptide. This is a powerful alternative method to traditional search engines for analyzing spectra, specifically useful for larger scale mass spectrometry studies. As an aid in this process, we propose a distance calculation relying on the rarity of experimental fragment peaks, following the intuition that peaks shared by only a few spectra offer more evidence than peaks shared by a large number of spectra. We used this distance calculation and a complete-linkage scheme to cluster data from a recent large-scale mass spectrometry-based study. The clusterings produced by our method have up to 40% more identified peptides for their consensus spectra compared to those produced by the previous state-of-the-art method. We see that our method would advance the construction of spectral libraries as well as serve as a tool for mining large sets of fragment spectra. The source code and Ubuntu binary packages are available at https://github.com/ statisticalbiotechnology/maracluster (under an Apache 2.0 license).

  • 12.
    The, Matthew
    et al.
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    MacCoss, M. J.
    Noble, W. S.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi. KTH, Centra, Science for Life Laboratory, SciLifeLab.
    Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.02016Ingår i: Journal of the American Society for Mass Spectrometry, ISSN 1044-0305, E-ISSN 1879-1123, Vol. 27, nr 11, s. 1719-1727Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches (PSMs) from such experiments. Percolator’s processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method—grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein—in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/ under an Apache 2.0 license. [Figure not available: see fulltext.]

  • 13.
    The, Matthew
    et al.
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    MacCoss, Michael J.
    Noble, William S.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Fast and accurate protein false discovery rates on large-scale proteomics data sets with Percolator 3.0Manuskript (preprint) (Övrigt vetenskapligt)
    Abstract [en]

    Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches (PSMs) from such experiments. Percolator's processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore,with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method - grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein - in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542).

    The source code and Ubuntu, Windows, MacOS and Fedora binary packages are available from http://percolator.ms/ under an Apache 2.0 license.

  • 14.
    The, Matthew
    et al.
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Tasnim, Ayesha
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    How to talk about protein-level false discovery rates in shotgun proteomicsManuskript (preprint) (Övrigt vetenskapligt)
    Abstract [en]

    A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate. Many researchers consider protein-level false discovery rates a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level false discovery rates, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the false discovery rate. Furthermore, we demonstrate how the same simulations can be used to verify false discovery rate estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level false discovery rates for both competing null hypotheses.

  • 15.
    The, Matthew
    et al.
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Tasnim, Ayesha
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Käll, Lukas
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    How to talk about protein-level false discovery rates in shotgun proteomics2016Ingår i: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 16, nr 18, s. 2461-2469Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein-level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level FDRs for both competing null hypotheses.

1 - 15 av 15
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf