kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 99) Show all publications
Squires, L., Giraldez Chavez, J. H., Nilsson, A., Käll, L. & Payne, S. H. (2026). Better Inputs, Better Learning: A Peptide Embedding Tutorial for Proteomic Mass Spectrometry. Journal of Proteome Research, 25(2), 1160-1165
Open this publication in new window or tab >>Better Inputs, Better Learning: A Peptide Embedding Tutorial for Proteomic Mass Spectrometry
Show others...
2026 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 25, no 2, p. 1160-1165Article in journal (Refereed) Published
Abstract [en]

Mass spectrometry proteomics creates complex data representing the peptide/protein contents of biological samples. Various types of machine learning have been central to computational methods used to identify peptides from tandem mass spectra and numerous other aspects of the data analysis process. As deep learning has emerged as a powerful machine learning method for modeling and interpreting data, computational proteomics researchers have leveraged large publicly available data sets to train machine learning models to predict peptide fragmentation spectra and liquid chromatography retention time. Resources like proteomicsML offer extensive demonstrative tutorials for these learning tasks and are closing the gap between the proteomics and machine learning communities. However, in these and other educational materials on deep learning, the critical step of preparing data for learning is frequently omitted. Prior to learning, peptide strings must be converted into a numeric format─an embedding. There are many different peptide embeddings, and some vastly outperform others. Yet the process for creating an embedding, and also the rationale for choosing a specific embedding, is rarely discussed in our proteomics literature. In this technical note, we introduce four Google Colab notebooks to teach peptide embeddings. The series walks users through five different peptide-embedding strategies─ from simplistic single-number encodings to state-of-the-art pretrained embeddings─ through both code examples and narrative descriptions. The final notebook compares the five embeddings in a head-to-head benchmark. By making these notebooks free, we hope to lower the barrier for researchers who want to bring modern deep learning into their proteomics workflows.

Place, publisher, year, edition, pages
American Chemical Society (ACS), 2026
Keywords
embedding, encoding, machine learning, peptide, proteomics AI, proteomics education, tutorials
National Category
Bioinformatics (Computational Biology) Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:kth:diva-377327 (URN)10.1021/acs.jproteome.5c00563 (DOI)001661059700001 ()41528974 (PubMedID)2-s2.0-105029493700 (Scopus ID)
Note

QC 20260227

Available from: 2026-02-27 Created: 2026-02-27 Last updated: 2026-02-27Bibliographically approved
Käll, L. (2025). De Novo Sequencing of MS2s and Cryo-EM Maps. Molecular & Cellular Proteomics, 24(8), Article ID 101238.
Open this publication in new window or tab >>De Novo Sequencing of MS2s and Cryo-EM Maps
2025 (English)In: Molecular & Cellular Proteomics, ISSN 1535-9476, E-ISSN 1535-9484, Vol. 24, no 8, article id 101238Article in journal, Meeting abstract (Other academic) Published
Place, publisher, year, edition, pages
Elsevier BV, 2025
National Category
Developmental Biology
Identifiers
urn:nbn:se:kth:diva-377542 (URN)10.1016/j.mcpro.2025.101238 (DOI)001631961100007 ()
Note

QC 20260316

Available from: 2026-03-16 Created: 2026-03-16 Last updated: 2026-03-16Bibliographically approved
Freestone, J., Käll, L., Noble, W. S. & Keich, U. (2025). How to Train a Postprocessor for Tandem Mass Spectrometry Proteomics Database Search While Maintaining Control of the False Discovery Rate. Journal of Proteome Research, 24(5), 2266-2279
Open this publication in new window or tab >>How to Train a Postprocessor for Tandem Mass Spectrometry Proteomics Database Search While Maintaining Control of the False Discovery Rate
2025 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 24, no 5, p. 2266-2279Article in journal (Refereed) Published
Abstract [en]

Decoy-based methods are a popular choice for the statistical validation of peptide detection in tandem mass spectrometry and proteomics data. Such methods can achieve a substantial boost in statistical power when coupled with postprocessors such as Percolator that use auxiliary features to learn a better-discriminating scoring function. However, we recently showed that Percolator can struggle to control the false discovery rate (FDR) when reporting the list of discovered peptides. To address this problem, we introduce Percolator-RESET, which is an adaptation of our recently developed RESET meta-procedure to the peptide detection problem. Specifically, Percolator-RESET fuses Percolator's iterative SVM training procedure with RESET's general framework to provide valid false discovery rate control. Percolator-RESET operates in both a standard single-decoy mode and a two-decoy mode, with the latter requiring the generation of two decoys per target. We demonstrate that Percolator-RESET controls the FDR in both modes, both theoretically and empirically, while typically reporting only a marginally smaller number of discoveries than Percolator in the single-decoy mode. The two-decoy mode is marginally more powerful than both Percolator and the single-decoy mode and exhibits less variability than the latter.

Place, publisher, year, edition, pages
American Chemical Society (ACS), 2025
Keywords
proteomics, false discovery rate control, tandemmass spectrometry
National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:kth:diva-362990 (URN)10.1021/acs.jproteome.4c00742 (DOI)001456007300001 ()40163043 (PubMedID)2-s2.0-105001488019 (Scopus ID)
Note

QC 20250506

Available from: 2025-05-06 Created: 2025-05-06 Last updated: 2025-05-06Bibliographically approved
Bjärterot, P., Nilsson, A., Shariatgorji, R., Vallianatou, T., Kaya, I., Svenningsson, P., . . . Andrén, P. E. (2025). Met-ID: An Open-Source Software for Comprehensive Annotation of Multiple On-Tissue Chemical Modifications in MALDI-MSI. Analytical Chemistry, 97(16), 9033-9041
Open this publication in new window or tab >>Met-ID: An Open-Source Software for Comprehensive Annotation of Multiple On-Tissue Chemical Modifications in MALDI-MSI
Show others...
2025 (English)In: Analytical Chemistry, ISSN 0003-2700, E-ISSN 1520-6882, Vol. 97, no 16, p. 9033-9041Article in journal (Refereed) Published
Abstract [en]

Here, we introduce Met-ID, a graphical user interface software designed to efficiently identify metabolites from MALDI-MSI data sets. Met-ID enables annotation of m/z features from any type of MALDI-MSI experiment, involving either derivatizing or conventional matrices. It utilizes structural information for derivatizing matrices to generate a subset of targets that contain only functional groups specific to the derivatization agent. The software is able to identify multiple derivatization sites on the same molecule, facilitating identification of the derivatized compound. This ability is exemplified by FMP-10, a reactive matrix that assists the covalent charge-tagging of molecules containing phenolic hydroxyl and/or primary or secondary amine groups. Met-ID also permits users to recalibrate data with known m/z ratios, boosting confidence in mass match results. Furthermore, Met-ID includes a database featuring MS2 spectra of numerous chemical standards, consisting of neurotransmitters and metabolites derivatized with FMP-10, alongside peaks for FMP-10 itself, all accessible directly through the software. The MS2 spectral database supports user-uploaded spectra and enables comparison of these spectra with user-provided tissue MS2 spectra for similarity assessment. Although initially installed with basic data, Met-ID is designed to be customizable, encouraging users to tailor the software to their specific needs. While several MSI-oriented software solutions exist, Met-ID combines both MS1 and MS2 functionalities. Developed in alignment with the FAIR Guiding Principles for scientific software, Met-ID is freely available as an open-source tool on GitHub, ensuring wide accessibility and collaboration.

Place, publisher, year, edition, pages
American Chemical Society (ACS), 2025
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-363458 (URN)10.1021/acs.analchem.5c00633 (DOI)001471685200001 ()40253716 (PubMedID)2-s2.0-105004009400 (Scopus ID)
Note

QC 20250519

Available from: 2025-05-15 Created: 2025-05-15 Last updated: 2025-05-19Bibliographically approved
Perez-Riverol, Y., Bittremieux, W., Noble, W. S., Martens, L., Bilbao, A., Lazear, M. R., . . . Fondrie, W. E. (2025). Open-Source and FAIR Research Software for Proteomics. Journal of Proteome Research, 24(5), 2222-2234
Open this publication in new window or tab >>Open-Source and FAIR Research Software for Proteomics
Show others...
2025 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 24, no 5, p. 2222-2234Article, review/survey (Refereed) Published
Abstract [en]

Scientific discovery relies on innovative software as much as experimental methods, especially in proteomics, where computational tools are essential for mass spectrometer setup, data analysis, and interpretation. Since the introduction of SEQUEST, proteomics software has grown into a complex ecosystem of algorithms, predictive models, and workflows, but the field faces challenges, including the increasing complexity of mass spectrometry data, limited reproducibility due to proprietary software, and difficulties integrating with other omics disciplines. Closed-source, platform-specific tools exacerbate these issues by restricting innovation, creating inefficiencies, and imposing hidden costs on the community. Open-source software (OSS), aligned with the FAIR Principles (Findable, Accessible, Interoperable, Reusable), offers a solution by promoting transparency, reproducibility, and community-driven development, which fosters collaboration and continuous improvement. In this manuscript, we explore the role of OSS in computational proteomics, its alignment with FAIR principles, and its potential to address challenges related to licensing, distribution, and standardization. Drawing on lessons from other omics fields, we present a vision for a future where OSS and FAIR principles underpin a transparent, accessible, and innovative proteomics community.

Place, publisher, year, edition, pages
American Chemical Society (ACS), 2025
Keywords
FAIR principles, open source, computationalproteomics, best practices, data reuse, open data, mass spectrometry, proteomics
National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:kth:diva-364258 (URN)10.1021/acs.jproteome.4c01079 (DOI)001473258300001 ()40267229 (PubMedID)2-s2.0-105003728834 (Scopus ID)
Note

QC 20250609

Available from: 2025-06-09 Created: 2025-06-09 Last updated: 2025-10-10Bibliographically approved
Lapin, J., Nilsson, A., Wilhelm, M. & Käll, L. (2025). Pairwise Attention: Leveraging Mass Differences to Enhance De Novo Sequencing of Mass Spectra. Journal of Proteome Research, 24(7), 3722-3730
Open this publication in new window or tab >>Pairwise Attention: Leveraging Mass Differences to Enhance De Novo Sequencing of Mass Spectra
2025 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 24, no 7, p. 3722-3730Article in journal (Refereed) Published
Abstract [en]

A fundamental challenge in mass spectrometry-based proteomics is determining which peptide generated a given MS2 spectrum. Peptide sequencing typically relies on matching spectra against a known sequence database, which in some applications is not available. Deep learning-based de novo sequencing can address this limitation by directly predicting peptide sequences from MS2 data. We have seen the application of the transformer architecture to de novo sequencing produce state-of-the-art results on the so-called nine-species benchmark. In this study, we propose an improved transformer encoder inspired by the heuristics used in the manual interpretation of spectra. We modify the attention mechanism with a learned bias based on pairwise mass differences, termed Pairwise Attention (PA). Adding PA improves average peptide precision at 100% coverage by 12.7% (5.9 percentage points) over our base transformer on the original nine-species benchmark. We have also achieved a 7.4% increase over the previously published model Casanovo. Our MS2 encoding strategy is largely orthogonal to other transformer-based models encoding MS2 spectra, enabling straightforward integration into existing deep-learning approaches. Our results show that integrating domain-specific knowledge into transformers boosts de novo sequencing performance.

Place, publisher, year, edition, pages
American Chemical Society (ACS), 2025
Keywords
Attention, De novo sequencing, Mass spectrometry, MS2, Proteomics, Transformers
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-364463 (URN)10.1021/acs.jproteome.5c00063 (DOI)001500093400001 ()40454436 (PubMedID)2-s2.0-105007317018 (Scopus ID)
Note

QC 20260127

Available from: 2025-06-12 Created: 2025-06-12 Last updated: 2026-03-27Bibliographically approved
Lapin, J., Nilsson, A., Wilhelm, M. & Käll, L. (2025). Pairwise Attention: Leveraging Mass Differences to Enhance De Novo Sequencing of Mass Spectra. Journal of Proteome Research, 24(7), 3722-3730
Open this publication in new window or tab >>Pairwise Attention: Leveraging Mass Differences to Enhance De Novo Sequencing of Mass Spectra
2025 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 24, no 7, p. 3722-3730Article in journal (Refereed) Published
Abstract [en]

A fundamental challenge in mass spectrometry-based proteomics is determining which peptide generated a given MS2 spectrum. Peptide sequencing typically relies on matching spectra against a known sequence database, which in some applications is not available. Deep learning-based de novo sequencing can address this limitation by directly predicting peptide sequences from MS2 data. We have seen the application of the transformer architecture to de novo sequencing produce state-of-the-art results on the so-called nine-species benchmark. In this study, we propose an improved transformer encoder inspired by the heuristics used in the manual interpretation of spectra. We modify the attention mechanism with a learned bias based on pairwise mass differences, termed Pairwise Attention (PA). Adding PA improves average peptide precision at 100% coverage by 12.7% (5.9 percentage points) over our base transformer on the original nine-species benchmark. We have also achieved a 7.4% increase over the previously published model Casanovo. Our MS2 encoding strategy is largely orthogonal to other transformer-based models encoding MS2 spectra, enabling straightforward integration into existing deep-learning approaches. Our results show that integrating domain-specific knowledge into transformers boosts de novo sequencing performance. 

Place, publisher, year, edition, pages
American Chemical Society (ACS), 2025
Keywords
Deep Learning, De Novo Sequencing
National Category
Bioinformatics and Computational Biology Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-378798 (URN)10.1021/acs.jproteome.5c00063 (DOI)001500093400001 ()40454436 (PubMedID)2-s2.0-105007317018 (Scopus ID)
Funder
Knut and Alice Wallenberg Foundation, KAW 2022.0032
Note

QC 20260331

Available from: 2026-03-27 Created: 2026-03-27 Last updated: 2026-03-31Bibliographically approved
Vasicek, J., Skiadopoulou, D., Kuznetsova, K. G., Käll, L., Vaudel, M. & Bruckner, S. (2025). ProHap Explorer: Visualizing Haplotypes in Proteogenomic Datasets. IEEE Computer Graphics and Applications, 45(5), 64-77
Open this publication in new window or tab >>ProHap Explorer: Visualizing Haplotypes in Proteogenomic Datasets
Show others...
2025 (English)In: IEEE Computer Graphics and Applications, ISSN 0272-1716, E-ISSN 1558-1756, Vol. 45, no 5, p. 64-77Article in journal (Refereed) Published
Abstract [en]

In mass spectrometry-based proteomics, experts usually project data onto a single set of reference sequences, overlooking the influence of common haplotypes (combinations of genetic variants inherited together from a parent). We recently introduced ProHap, a tool for generating customized protein haplotype databases. Here, we present ProHap Explorer, a visualization interface designed to investigate the influence of common haplotypes on the human proteome. It enables users to explore haplotypes, their effects on protein sequences, and the identification of non-canonical peptides in public mass spectrometry datasets. The design builds on well-established representations in biological sequence analysis, ensuring familiarity for domain experts while integrating novel interactive elements tailored to proteogenomic data exploration. User interviews with proteomics experts confirmed the tool's utility, highlighting its ability to reveal whether haplotypes affect proteins of interest. By facilitating the intuitive exploration of proteogenomic variation, ProHap Explorer supports research in personalized medicine and the development of targeted therapies.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Bioinformatics (Computational Biology) Medical Genetics and Genomics
Identifiers
urn:nbn:se:kth:diva-368669 (URN)10.1109/MCG.2025.3581736 (DOI)001590142500015 ()40540386 (PubMedID)2-s2.0-105008961751 (Scopus ID)
Note

QC 20260127

Available from: 2025-08-21 Created: 2025-08-21 Last updated: 2026-01-27Bibliographically approved
Schulte, D., Siborova, M., Käll, L. & Snijder, J. (2025). Simultaneous polyclonal antibody sequencing and epitope mapping by cryo electron microscopy and mass spectrometry. eLIFE, 14, Article ID RP101322.
Open this publication in new window or tab >>Simultaneous polyclonal antibody sequencing and epitope mapping by cryo electron microscopy and mass spectrometry
2025 (English)In: eLIFE, E-ISSN 2050-084X, Vol. 14, article id RP101322Article in journal (Refereed) Published
Abstract [en]

Antibodies are a major component of adaptive immunity against invading pathogens. Here, we explore possibilities for an analytical approach to characterize the antigen-specific antibody repertoire directly from the secreted proteins in convalescent serum. This approach aims to perform simultaneous antibody sequencing and epitope mapping using a combination of single particle cryo-electron microscopy (cryoEM) and bottom-up proteomics techniques based on mass spectrometry (LC-MS/MS). We evaluate the performance of the deep-learning tool ModelAngelo in determining de novo antibody sequences directly from reconstructed 3D volumes of antibody-antigen complexes. We demonstrate that while map quality is a critical bottleneck, it is possible to sequence antibody variable domains from cryoEM reconstructions with accuracies of up to 80-90%. While the rate of errors exceeds the typical levels of somatic hypermutation, we show that the ModelAngelo-derived sequences can be used to assign the used V-genes. This provides a functional guide to assemble de novo peptides from LC-MS/MS data more accurately and improves the tolerance to a background of polyclonal antibody sequences. Following this proof-of-principle, we discuss the feasibility and future directions of this approach to characterize antigen-specific antibody repertoires.

Place, publisher, year, edition, pages
eLife Sciences Publications, Ltd, 2025
Keywords
antibody, epitope, repertoire, proteomics, cryoEM, Human, Mouse, Rhesus macaque, Other
National Category
Immunology in the Medical Area
Identifiers
urn:nbn:se:kth:diva-364249 (URN)10.7554/eLife.101322 (DOI)001473792700001 ()40266252 (PubMedID)
Note

QC 20250611

Available from: 2025-06-11 Created: 2025-06-11 Last updated: 2025-10-10Bibliographically approved
Jamali, K., Käll, L., Zhang, R., Brown, A., Kimanius, D. & Scheres, S. H. .. (2024). Automated model building and protein identification in cryo-EM maps. Nature, 628(8007), 450-457
Open this publication in new window or tab >>Automated model building and protein identification in cryo-EM maps
Show others...
2024 (English)In: Nature, ISSN 0028-0836, E-ISSN 1476-4687, Vol. 628, no 8007, p. 450-457Article in journal (Refereed) Published
Abstract [en]

Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high levels of expertise and labour-intensive manual intervention in three-dimensional computer graphics programs1,2. Here we present ModelAngelo, a machine-learning approach for automated atomic model building in cryo-EM maps. By combining information from the cryo-EM map with information from protein sequence and structure in a single graph neural network, ModelAngelo builds atomic models for proteins that are of similar quality to those generated by human experts. For nucleotides, ModelAngelo builds backbones with similar accuracy to those built by humans. By using its predicted amino acid probabilities for each residue in hidden Markov model sequence searches, ModelAngelo outperforms human experts in the identification of proteins with unknown sequences. ModelAngelo will therefore remove bottlenecks and increase objectivity in cryo-EM structure determination.

Place, publisher, year, edition, pages
Springer Nature, 2024
National Category
Molecular Biology
Identifiers
urn:nbn:se:kth:diva-367013 (URN)10.1038/s41586-024-07215-4 (DOI)001190133900002 ()38408488 (PubMedID)2-s2.0-85186566975 (Scopus ID)
Note

QC 20250714

Available from: 2025-07-14 Created: 2025-07-14 Last updated: 2025-07-14Bibliographically approved
Projects
Spatial Omics Enable Improved Pathophysiology-based Diagnosis of Parkinson´s Disease Dementia and Dementia with Lewy Bodies [2021-03293_VR]; Uppsala University
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-5689-9797

Search in DiVA

Show all publications