kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Genome-wide single cell annotation of the human protein-coding genes
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Protein Science.ORCID iD: 0000-0002-7000-4416
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Protein Science.
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Protein Science.
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Protein Science.ORCID iD: 0000-0002-3721-8586
Show others and affiliations
(English)Manuscript (preprint) (Other academic)
Abstract [en]

An important quest for the life science community is to deliver a complete annotation of the human building-blocks of life, the genes and the proteins. Here, we report on a genome-wide effort to annotate all protein-coding genes based on single cell transcriptomics data representing all major tissues and organs in the human body, integrated with data from bulk transcriptomics and antibody-based tissue profiling. Altogether, 25 tissues have been analyzed with single cell transcriptomics resulting in genome-wide expression in 444 single cell types using a strategy involving pooling data from individual cells to obtain genome-wide expression profiles of individual cell type. We introduce a new genome-wide classification tool based on clustering of similar expression profiles across single cell types, which can be visualized using dimensional reduction maps (UMAP). The clustering classification is integrated with a new “tau” score classification for all protein-coding genes, resulting in a measure of single cell specificity across all cell types for all individual genes. The analysis has allowed us to annotate all human protein-coding genes with regards to function and spatial distribution across individual cell types across all major tissues and organs in the human body. A new version of the open access Human Protein Atlas (www.proteinatlas.org) has been launched to enable researchers to explore the new genome-wide annotation on an individual gene level.

Keywords [en]
protein, annotation, clustering, specificity, tissue, single-cell, RNA-Seq, scRNA-Seq
National Category
Bioinformatics and Computational Biology
Research subject
Biotechnology
Identifiers
URN: urn:nbn:se:kth:diva-312021OAI: oai:DiVA.org:kth-312021DiVA, id: diva2:1657038
Note

QC 20220524

Available from: 2022-05-09 Created: 2022-05-09 Last updated: 2025-02-07Bibliographically approved
In thesis
1. Mapping and annotating the mammalian body-wide protein-coding gene expression
Open this publication in new window or tab >>Mapping and annotating the mammalian body-wide protein-coding gene expression
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

A central aim of fundamental research is to create conditions necessary for fueling further research and innovation. Our understanding of basic biology is central for future developments of tools for diagnosing, monitoring, and treating disease. This doctoral thesis focuses on mapping the mammalian protein-coding gene expression in healthy cells and tissues, and annotation of genes based on their expression patterns, specificity, location, and function. This has in large part been achieved by using large scale transcriptomic and proteomic profiling to describe the gene expression landscape that defines the identities of the great diversity of cells present in mammals. Characterization of gene expression across different tissues and cell types provide fundamental tools to enable the exploration, summary, and ultimately, the annotation of the mammalian proteome, which is still incomplete.

The studies comprising this thesis have contributed to the Human Protein Atlas, an online open-access portal for proteomic and transcriptomic data, with the aim to profile each human protein-coding gene to create a spatial map of the molecular organization of the human body, providing basic tools for the scientific community. Paper I comprises an effort to catalogue all proteins that are actively secreted from cells; defining the human secretome. Paper II entails the deep characterization and annotation of the protein-coding transcriptome of 18 peripheral immune cell types. Paper III describes the, to date, most comprehensive tissue-based transcriptomic profiling of protein-coding genes in 98 tissues of the increasingly important model animal pig. Paper IV extends previous tissue-based maps of the human protein-coding genome by integration of 13 single cell transcriptome datasets. Paper V explores the human protein-coding genome in a clustering-based annotation of co-expressed genes across single cells and tissues to provide a framework for finding previously unknown functional relationships between genes by the principle of “guilt-by-association”.

In summary, the work described here entails a small contribution to the grand effort of spatially mapping proteins across tissues and cell types, for building a framework of biological knowledge that can lead to increased understanding of the constituents that make us humans.

Abstract [sv]

Ett centralt mål för grundvetenskap är att skapa förutsättningar för framtida forskning och innovation. Vår förståelse för grundläggande biologi är essentiell för utveckling av verktyg för diagnosticering, uppföljning, och behandling av sjukdomar. Denna avhandling fokuserar på kartläggningen av det proteinkodande genuttrycket hos däggdjur i friska celler och vävnader, samt annoteringen av gener baserat på deras uttrycksmönster, specificitet, lokalisering, och funktion. Detta har till stor del uppnåtts genom storskalig transkriptomik- och proteomik-baserad profilering för att beskriva de genuttrycksmönster som definierar de identiteter den stora mångfalden av celler som finns i däggdjur. Karaktäriseringen av genuttryck bland vävnader och celltyper utgör viktiga verktyg för att möjliggöra utforskning, sammanställande, och slutligen, annoteringen av däggdjurs proteom som fortfarande är ofullständig. 

Studierna som utgör denna avhandling har bidragit till the Human Protein Atlas; en online-portal med fri tillgång för proteomik- och transkriptomikdata, med en målsättning att beskriva uttrycket av samtliga proteinkodande gener. Genom att skapa en karta av den molekylära organiseringen av människokroppen utgör detta projekt ett väsentligt verktyg för forskning. Artikel I utgör en katalogisering av alla proteiner som aktivt sekreteras från celler, för att definiera det mänskliga sekretomet. Artikel II handlar om en djup karaktärisering och annotering av det proteinkodande transkriptomet hos 18 perifera immuncelltyper. Artikel III beskriver den, till dagens datum, mest omfattande vävnadsbaserade kartan av proteinkodande gener i 98 vävnader i gris, som har blivit en allt viktigare modellorganism. Artikel IV utvidgar de tidigare vävnadsbaserade kartor av det proteinkodande genomet, genom att integrera 13 encellstranskriptomik-dataset. Artikel V utforskar det mänskliga proteinkodande genomet i en klustringsbaserad annotering av samuttryckta gener, för att bygga ett ramverk för att hitta tidigare okända funktionella samband mellan gener, enligt principen av ”associationsskuld”.

Arbetet beskrivet här utgör ett bidrag till det omfattande arbetet att kartlägga proteiners lokalisering i vävnader och celler, för att bygga ett ramverk av biologisk kunskap som kan leda till ökad förståelse för komponenterna som gör oss till människor. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2022. p. 75
Series
TRITA-CBH-FOU ; 2022:32
Keywords
protein, protein-coding, genes, annotation, atlas, scRNA-Seq, RNA-Seq, transcriptomics, proteomics
National Category
Bioinformatics and Computational Biology
Research subject
Biotechnology
Identifiers
urn:nbn:se:kth:diva-312033 (URN)978-91-8040-250-7 (ISBN)
Public defence
2022-06-03, Eva & Georg Klein, Biomedicum, Solnavägen 9, via Zoom: https://kth-se.zoom.us/j/66922122998, Solna, 09:30 (English)
Opponent
Supervisors
Funder
Knut and Alice Wallenberg Foundation
Note

QC 2022-05-11

Available from: 2022-05-11 Created: 2022-05-10 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

Karlsson, MaxAlvez, Maria BuenoShi, MengnanZhang, ChengZhong, WenEdfors, FredrikOksvold, Pervon Feilitzen, KalleZwahlen, MartinForsberg, MattiasJohansson, FredricMardinoglu, AdilSivertsson, ÅsaFagerberg, LinnUhlén, Mathias

Search in DiVA

By author/editor
Karlsson, MaxAlvez, Maria BuenoShi, MengnanZhang, ChengZhong, WenEdfors, FredrikOksvold, Pervon Feilitzen, KalleZwahlen, MartinForsberg, MattiasJohansson, FredricMardinoglu, AdilSivertsson, ÅsaFagerberg, LinnUhlén, Mathias
By organisation
Protein Science
Bioinformatics and Computational Biology

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 1250 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf