kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Genome-wide annotation of protein-coding genes in pig
KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Protein Science.ORCID iD: 0000-0002-7000-4416
Karolinska Inst, Dept Neurosci, Stockholm, Sweden.;Uppsala Univ, Dept Immunol Genet & Pathol, Uppsala, Sweden..
KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Protein Science.ORCID iD: 0000-0003-3014-5502
KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Protein Science.ORCID iD: 0000-0001-8800-8469
Show others and affiliations
2022 (English)In: BMC Biology, E-ISSN 1741-7007, Vol. 20, no 1, article id 25Article in journal (Refereed) Published
Abstract [en]

Background: There is a need for functional genome-wide annotation of the protein-coding genes to get a deeper understanding of mammalian biology. Here, a new annotation strategy is introduced based on dimensionality reduction and density-based clustering of whole-body co-expression patterns. This strategy has been used to explore the gene expression landscape in pig, and we present a whole-body map of all protein-coding genes in all major pig tissues and organs. Results: An open-access pig expression map (www.rnaatlas.org ) is presented based on the expression of 350 samples across 98 well-defined pig tissues divided into 44 tissue groups. A new UMAP-based classification scheme is introduced, in which all protein-coding genes are stratified into tissue expression clusters based on body-wide expression profiles. The distribution and tissue specificity of all 22,342 protein-coding pig genes are presented. Conclusions: Here, we present a new genome-wide annotation strategy based on dimensionality reduction and density-based clustering. A genome-wide resource of the transcriptome map across all major tissues and organs in pig is presented, and the data is available as an open-access resource (www.rnaatlas.org), including a comparison to the expression of human orthologs.

Place, publisher, year, edition, pages
Springer Nature , 2022. Vol. 20, no 1, article id 25
Keywords [en]
Annotation, Protein coding genes, Genome wide, Transcriptome, Gene expression, Tissue expression profile
National Category
Biochemistry Molecular Biology Medical Biotechnology Bioinformatics and Computational Biology
Identifiers
URN: urn:nbn:se:kth:diva-307759DOI: 10.1186/s12915-022-01229-yISI: 000746863800002PubMedID: 35073880Scopus ID: 2-s2.0-85123754738OAI: oai:DiVA.org:kth-307759DiVA, id: diva2:1636197
Note

QC 20220209

Available from: 2022-02-09 Created: 2022-02-09 Last updated: 2025-02-20Bibliographically approved
In thesis
1. Mapping and annotating the mammalian body-wide protein-coding gene expression
Open this publication in new window or tab >>Mapping and annotating the mammalian body-wide protein-coding gene expression
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

A central aim of fundamental research is to create conditions necessary for fueling further research and innovation. Our understanding of basic biology is central for future developments of tools for diagnosing, monitoring, and treating disease. This doctoral thesis focuses on mapping the mammalian protein-coding gene expression in healthy cells and tissues, and annotation of genes based on their expression patterns, specificity, location, and function. This has in large part been achieved by using large scale transcriptomic and proteomic profiling to describe the gene expression landscape that defines the identities of the great diversity of cells present in mammals. Characterization of gene expression across different tissues and cell types provide fundamental tools to enable the exploration, summary, and ultimately, the annotation of the mammalian proteome, which is still incomplete.

The studies comprising this thesis have contributed to the Human Protein Atlas, an online open-access portal for proteomic and transcriptomic data, with the aim to profile each human protein-coding gene to create a spatial map of the molecular organization of the human body, providing basic tools for the scientific community. Paper I comprises an effort to catalogue all proteins that are actively secreted from cells; defining the human secretome. Paper II entails the deep characterization and annotation of the protein-coding transcriptome of 18 peripheral immune cell types. Paper III describes the, to date, most comprehensive tissue-based transcriptomic profiling of protein-coding genes in 98 tissues of the increasingly important model animal pig. Paper IV extends previous tissue-based maps of the human protein-coding genome by integration of 13 single cell transcriptome datasets. Paper V explores the human protein-coding genome in a clustering-based annotation of co-expressed genes across single cells and tissues to provide a framework for finding previously unknown functional relationships between genes by the principle of “guilt-by-association”.

In summary, the work described here entails a small contribution to the grand effort of spatially mapping proteins across tissues and cell types, for building a framework of biological knowledge that can lead to increased understanding of the constituents that make us humans.

Abstract [sv]

Ett centralt mål för grundvetenskap är att skapa förutsättningar för framtida forskning och innovation. Vår förståelse för grundläggande biologi är essentiell för utveckling av verktyg för diagnosticering, uppföljning, och behandling av sjukdomar. Denna avhandling fokuserar på kartläggningen av det proteinkodande genuttrycket hos däggdjur i friska celler och vävnader, samt annoteringen av gener baserat på deras uttrycksmönster, specificitet, lokalisering, och funktion. Detta har till stor del uppnåtts genom storskalig transkriptomik- och proteomik-baserad profilering för att beskriva de genuttrycksmönster som definierar de identiteter den stora mångfalden av celler som finns i däggdjur. Karaktäriseringen av genuttryck bland vävnader och celltyper utgör viktiga verktyg för att möjliggöra utforskning, sammanställande, och slutligen, annoteringen av däggdjurs proteom som fortfarande är ofullständig. 

Studierna som utgör denna avhandling har bidragit till the Human Protein Atlas; en online-portal med fri tillgång för proteomik- och transkriptomikdata, med en målsättning att beskriva uttrycket av samtliga proteinkodande gener. Genom att skapa en karta av den molekylära organiseringen av människokroppen utgör detta projekt ett väsentligt verktyg för forskning. Artikel I utgör en katalogisering av alla proteiner som aktivt sekreteras från celler, för att definiera det mänskliga sekretomet. Artikel II handlar om en djup karaktärisering och annotering av det proteinkodande transkriptomet hos 18 perifera immuncelltyper. Artikel III beskriver den, till dagens datum, mest omfattande vävnadsbaserade kartan av proteinkodande gener i 98 vävnader i gris, som har blivit en allt viktigare modellorganism. Artikel IV utvidgar de tidigare vävnadsbaserade kartor av det proteinkodande genomet, genom att integrera 13 encellstranskriptomik-dataset. Artikel V utforskar det mänskliga proteinkodande genomet i en klustringsbaserad annotering av samuttryckta gener, för att bygga ett ramverk för att hitta tidigare okända funktionella samband mellan gener, enligt principen av ”associationsskuld”.

Arbetet beskrivet här utgör ett bidrag till det omfattande arbetet att kartlägga proteiners lokalisering i vävnader och celler, för att bygga ett ramverk av biologisk kunskap som kan leda till ökad förståelse för komponenterna som gör oss till människor. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2022. p. 75
Series
TRITA-CBH-FOU ; 2022:32
Keywords
protein, protein-coding, genes, annotation, atlas, scRNA-Seq, RNA-Seq, transcriptomics, proteomics
National Category
Bioinformatics and Computational Biology
Research subject
Biotechnology
Identifiers
urn:nbn:se:kth:diva-312033 (URN)978-91-8040-250-7 (ISBN)
Public defence
2022-06-03, Eva & Georg Klein, Biomedicum, Solnavägen 9, via Zoom: https://kth-se.zoom.us/j/66922122998, Solna, 09:30 (English)
Opponent
Supervisors
Funder
Knut and Alice Wallenberg Foundation
Note

QC 2022-05-11

Available from: 2022-05-11 Created: 2022-05-10 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Karlsson, MaxOksvold, PerSivertsson, ÅsaAlvez, Maria BuenoArif, MuhammadLi, XiangyuMardinoglu, AdilZhang, Chengvon Feilitzen, KalleZhong, WenFagerberg, LinnUhlén, Mathias

Search in DiVA

By author/editor
Karlsson, MaxOksvold, PerSivertsson, ÅsaAlvez, Maria BuenoArif, MuhammadLi, XiangyuMardinoglu, AdilZhang, Chengvon Feilitzen, KalleZhong, WenFagerberg, LinnLuo, YonglunUhlén, Mathias
By organisation
Science for Life Laboratory, SciLifeLabProtein ScienceSystems Biology
In the same journal
BMC Biology
BiochemistryMolecular BiologyMedical BiotechnologyBioinformatics and Computational Biology

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 156 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf