kth.sePublications KTH
456789107 of 17
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Genomic-based approaches for identifying risk loci and facilitating precision medicine in human diseases
KTH, Centres, Science for Life Laboratory, SciLifeLab. KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Protein Science, Systems Biology.ORCID iD: 0009-0001-3893-682X
2026 (English)Doctoral thesis, comprehensive summary (Other academic)
Sustainable development
SDG 3: Good Health and Well-Being
Abstract [en]

Genomics represents the first layer of the central dogma, where variations in the genome influence downstream biological processes and finally affect human phenomics. Although next-generation sequencing (NGS) technologies have generated massive amounts of genomic data, gaps remain in translating genomics into clinical and precision medicine applications.

The first part of this thesis (Papers I–II) focuses on developing high-performance computational platforms for clinical genomic and research applications. In Paper I, we developed GenRiskPro, a platform designed to enhance connectivity among key stakeholders in clinical genetics, including hospitals, research facilities, clinicians, and patients. It prioritizes reporting genetic risk variants for rare diseases and includes tools for detecting pharmacogenomic (PGx) and lifestyle-associated variants affecting complex traits. We also revealed population-specific genetic heterogeneity in the enrichment of pathogenic and low-to-rare-frequency risk variants, which frequently exhibit low penetrance in their phenomic presentations.

In Paper II, we developed OncoRisk, a comprehensive web server integrating multiple precision oncology knowledge bases and pan-cancer cohorts. The server comprises four major modules. It enables the query of key oncogenic terms such as mutations, gene fusions, diseases, and therapies, and supports analysis of individual tumor sequencing data to identify potentially significant mutations by mapping with known oncogenic resources. We also built functions for fast visualization and analysis of cancer cohort data and for querying of mutation frequencies for specific genes or variants across large-scale cancer sequencing cohorts.

The second part of this thesis (Papers III–IV) explores the genetic architecture of complex diseases using our in-house cohorts. In Paper III, we explored the survival effects of the expression quantitative trait loci (eQTLs). Firstly, we identified 805 eGenes and 4,558 cis-eQTLs in a Japanese kidney cancer cohort (n=100). Then, we validated these findings cross-ethnically using TCGA data (n=287) through comprehensive survival analyses across different allelic and covariate models, revealing regulatory variants with consistent effects on patient survival. Lastly, in Paper IV, we conducted an integrative genomic-transcriptomic study in a pediatric congenital heart disease (CHD) cohort. We have identified known pathogenic variants and found that rare missense variant burdens of CHD-associated genes were significantly enriched in CHD patients compared to controls. Transcriptomic analysis revealed shifts in oxidative phosphorylation and interferon signaling. Functional analysis of overlapping eGenes and differentially expressed genes (DEGs) highlighted involvement in small GTPase-mediated signaling and cytoskeleton organization, and integration of rare-variant burdens further identified high-confidence candidates.

In summary, this thesis demonstrates the complex impacts of rare and common variants on human health across both rare and complex disease contexts. By developing computational platforms to assist in identifying risk loci and leveraging genomic data to uncover novel potential drivers, this work aims to advance translational genetics and precision medicine.

Abstract [sv]

Genomik representerar det första lagret i den centrala dogmen, där variationer i genomet påverkar nedströms biologiska processer och slutligen formar human fenomik. Även om nästa generations sekvenseringsteknik (NGS) har genererat enorma mängder genomiska data, kvarstår betydande luckor i tillämpningen av genomik inom klinisk medicin och precisionsmedicin.

Den första delen av denna avhandling (Artikel I–II) fokuserar på utveckling av högpresterande plattformar för kliniska genomiska och forskningstillämpningar. I Artikel I utvecklade vi GenRiskPro, en plattform utformad för att förbättra samverkan mellan viktiga aktörer inom klinisk genetik, däribland sjukhus, forskningsanläggningar, kliniker och patienter. Plattformen prioriterar rapportering av genetiska riskvarianter för sällsynta sjukdomar och inkluderar verktyg för att identifiera farmakogenomiska (PGx) och livsstilsassocierade varianter som påverkar komplexa egenskaper. Vi påvisade även populationsspecifik genetisk heterogenitet i anrikningen av patogena varianter och varianter med låg till sällsynt frekvens, vilka ofta uppvisar låg penetrans i sina fenomiska manifestationer.

I Artikel II utvecklade vi OncoRisk, en omfattande webbserver som integrerar flera kunskapsbaser inom precisionsonkologi samt pan-cancerkohort-data. Servern består av fyra huvudmoduler. Den möjliggör sökning av centrala onkogena termer såsom mutationer, genfusioner, sjukdomar och terapier, samt stödjer analys av individuella tumörsekvenseringsdata för att identifiera potentiellt betydelsefulla mutationer genom kartläggning mot kända onkogena resurser. Vi utvecklade även funktioner för snabb visualisering och analys av cancerkohortsdata samt för sökning av mutationsfrekvenser för specifika gener eller varianter i storskaliga cancersekvenseringskohorter.

Den andra delen av avhandlingen (Artikel III–IV) utforskar den genetiska arkitekturen hos komplexa sjukdomar med hjälp av våra egna kohorter. I Artikel III undersökte vi överlevnadseffekterna av expressionskvantitativa traitloki (eQTL). Vi identifierade inledningsvis 805 eGener och 4 558 cis-eQTL i en japansk njurcancerkohort (n=100). Därefter validerade vi dessa fynd tväretniskt med hjälp av TCGA-data (n=287) genom omfattande överlevnadsanalyser över olika alleliska och kovariat-modeller, vilket påvisade regulatoriska varianter med konsekventa effekter på patientöverlevnad.

I Artikel IV genomförde vi en integrativ genomisk-transkriptomisk studie i en pediatrisk kohort med medfödd hjärtsjukdom (CHD). Vi identifierade kända patogena varianter och fann att bördan av sällsynta missense-varianter i CHD-associerade gener var signifikant anrikad hos CHD-patienter jämfört med kontroller. Transkriptomisk analys påvisade förändringar i oxidativ fosforylering och interferonsignalering. Funktionell analys av överlappande eGener och differentiellt uttryckta gener (DEG) belyste deras involvering i liten GTPas-medierad signalering och cytoskelettonisering, och integration av sällsynta variantbördor identifierade ytterligare högkonfidens-kandidater.

Sammanfattningsvis belyser denna avhandling de komplexa effekterna av sällsynta och vanliga varianter på människors hälsa inom både sällsynta och komplexa sjukdomskontexter. Genom att utveckla beräkningsplattformar för att underlätta identifiering av riskloki och utnyttja genomiska data för att upptäcka nya potentiella drivande faktorer, syftar detta arbete till att främja translationell genetik och precisionsmedicin.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2026. , p. 92
Series
TRITA-CBH-FOU ; 2026:18
Keywords [en]
Clinical Genomics, Translational Genetics, Genomic Architecture, Variant Interpretation, Precision Medicine, Precision Oncology, Web Server Development, eQTL Analysis, Rare and Complex Diseases, Multi-Omics, Bioinformatics, Systems Biology
National Category
Bioinformatics and Computational Biology Medical Genetics and Genomics Medical Bioinformatics and Systems Biology
Research subject
Biotechnology
Identifiers
URN: urn:nbn:se:kth:diva-378880ISBN: 978-91-8106-572-5 (print)OAI: oai:DiVA.org:kth-378880DiVA, id: diva2:2049546
Public defence
2026-04-23, F3 KTH Campus, via Zoom: https://kth-se.zoom.us/j/69920511035, Lindstedtvägen 26, Stockholm, 14:00 (English)
Opponent
Supervisors
Funder
Knut and Alice Wallenberg Foundation, 72110
Note

QC 2026-03-30

Available from: 2026-03-30 Created: 2026-03-30 Last updated: 2026-04-08Bibliographically approved
List of papers
1. GenRiskPro: A Comprehensive Whole-Genome Sequencing Analysis Platform for Clinical and Wellness Applications
Open this publication in new window or tab >>GenRiskPro: A Comprehensive Whole-Genome Sequencing Analysis Platform for Clinical and Wellness Applications
Show others...
2026 (English)In: Computational and Structural Biotechnology Journal, E-ISSN 2001-0370, Vol. 35, no 2, article id 0011Article in journal (Refereed) Published
Abstract [en]

Despite rapid advances in whole-genome sequencing (WGS), translating genomic findings into individualized insights remains challenging. We present GenRiskPro, a clinical decision-support and research platform, which automates WGS variant calling, annotation, prioritization, and reporting to deliver actionable findings and facilitate precision wellness. (To test the GenRiskPro platform, log on to https://www.phenomeportal.org/dashboard using the following credentials: Username: user@test.com; Password: test.) GenRiskPro integrates rare and common variant prioritization in a unified pipeline and in-house database, enabling both rare and complex disease and trait association analyses. Variant reporting is supported via LongevityCloud, which features a web portal for clinicians to review, adjust, and authorize the return of results in tabular and PDF formats, alongside a mobile app with artificial intelligence (AI) integration for sequenced individuals. Case studies using Turkish (TR, n = 275) and Swedish (SW, n = 101) WGS data assessed platform performance and variant prioritization: (a) predefined gene panels yielded a 1.82% positive rate for actionable findings per American College of Medical Genetics and Genomics (ACMG) secondary findings guidelines; (b) phenotype-driven support diagnosed cases including muscular dystrophy and microcephaly; (c) cohort-level ClinVar reassessment identified potentially misclassified pathogenic variants; (d) rare variant burden analysis revealed enrichment in ABCA4 for TR and SMPD1 in SW; and (e) population analysis highlighted carrier differences in trait-associated SNPs (rs12913832 and rs4988235) and PGx variants (CYP2B64 and CYP2B66). GenRiskPro unifies databases, literature, web development, and AI for rapid, user-friendly genomic analysis and reporting, which fosters collaboration among hospitals, researchers, clinicians, and patients.

Place, publisher, year, edition, pages
American Association for the Advancement of Science (AAAS), 2026
National Category
Bioinformatics and Computational Biology Medical Genetics and Genomics
Identifiers
urn:nbn:se:kth:diva-378858 (URN)10.34133/csbj.0011 (DOI)
Funder
Knut and Alice Wallenberg Foundation, CJDB 72110
Note

QC 20260330

Available from: 2026-03-27 Created: 2026-03-27 Last updated: 2026-03-30Bibliographically approved
2. OncoRisk: A state-of-the-art Web Server for bridging the oncogenic databases and pan-cancer cohorts to the translational oncology
Open this publication in new window or tab >>OncoRisk: A state-of-the-art Web Server for bridging the oncogenic databases and pan-cancer cohorts to the translational oncology
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Accurate interpretation of genomic variants remains a major bottleneck in precision oncology, due in part to fragmented knowledge across databases and limited integration between clinical evidence and population-scale genomic datasets. Here we present OncoRisk, a stand-alone, user-friendly web server that unifies data from over ten oncogenic databases and seven large-scale pan-cancer cohorts, enabling rapid multi-database queries and network-based exploration of genomic variants, gene-gene interactions, and therapy associations. The platform features a semi-automated reporting workflow that generates comprehensive, patient-specific clinical reports from raw tissue sequencing data and categorizes variants into actionable tiers. For translational research, OncoRisk provides modules for data-driven exploration, allowing users to validate findings by interrogating mutation frequencies and clinical associations across real-world patient data. Furthermore, an integrated suite of analytical tools enables comprehensive, cohort-level investigations of mutational landscapes, prognostic biomarkers, and oncogenic signaling pathways. By providing a unified ecosystem that bridges curated knowledge with large-scale cohort data, OncoRisk serves as an effective catalyst for both discovery research and clinical application in oncology. OncoRisk is publicly available at https://www.phenomeportal.org/oncorisk.

Keywords
Cancer Genomics; Precision Oncology; Tumor Biomarkers; Pan-cancers
National Category
Cancer and Oncology Medical Genetics and Genomics Bioinformatics and Computational Biology
Research subject
Biotechnology
Identifiers
urn:nbn:se:kth:diva-378803 (URN)
Funder
Knut and Alice Wallenberg Foundation, 72110
Note

Accepted in Communications Biology, In press

QC 20260330

Available from: 2026-03-27 Created: 2026-03-27 Last updated: 2026-03-31Bibliographically approved
3. Systematically identification of survival-associated eQTLs in a Japanese kidney cancer cohort
Open this publication in new window or tab >>Systematically identification of survival-associated eQTLs in a Japanese kidney cancer cohort
Show others...
2025 (English)In: PLOS Genetics, ISSN 1553-7390, E-ISSN 1553-7404, Vol. 21, no 7 July, article id e1011770Article in journal (Refereed) Published
Abstract [en]

Background Clear cell renal carcinoma (ccRCC) is the predominant form of kidney cancer, but the prognostic value of expression quantitative trait loci (eQTLs) remains underexplored, particularly in Asian populations. Objective We analyzed whole-exome sequencing and RNA sequencing data from 100 Japanese ccRCC patients to identify eQTLs. Multiple Cox proportional hazard models assessed survival associations, with validation in the Cancer Genome Atlas ccRCC cohort (n = 287). Results We identified 805 eGenes and 4,558 cis-eQTLs in the Japanese cohort. Survival analysis revealed a total of 9 eGenes significantly associated with overall survival (FDR < 0.05). Further exploratory analysis were performed using 158 eGenes and 711 eQTLs (p-value <0.05) as potential prognostic signals. Among these, 223 eQTLs regulating 54 eGenes showed consistent prognostic effects at both expression and genetic levels. Cross-population validation identified eight eQTLs regulating 11 eGenes with reproducible survival associations across ethnicities, including a missense mutation in ERV3–1 and regulatory variants near ANKRD20A7P. These variants demonstrated consistent allelic effects on both gene expression and patient survival in both cohorts.

Place, publisher, year, edition, pages
Public Library of Science (PLoS), 2025
National Category
Cancer and Oncology
Identifiers
urn:nbn:se:kth:diva-368942 (URN)10.1371/journal.pgen.1011770 (DOI)001524169900006 ()40622919 (PubMedID)2-s2.0-105009893848 (Scopus ID)
Note

QC 20250828

Available from: 2025-08-28 Created: 2025-08-28 Last updated: 2026-03-30Bibliographically approved
4. Integrative analysis of the whole genome and transcriptome for congenital heart diseases
Open this publication in new window or tab >>Integrative analysis of the whole genome and transcriptome for congenital heart diseases
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Congenital heart disease (CHD) is the most common birth defect, yet its molecular etiology remains poorly understood. Recent advances in sequencing technology offer opportunities to uncover genetic and transcriptomic contributions to CHD. We performed an integrative multi-omics study on a pediatric CHD cohort (n=211) using whole-genome sequencing (WGS) and paired whole-blood transcriptomics (n=100). WGS identified approximately 28 million variants, including 309 known pathogenic and 724 protein-loss-of-function (pLoF) variants. Within a curated CHD gene list, 5 patients carried known pathogenic variants in EVC, HSPA9, DNAH11, PTPN11, and FBN1. Rare-variant burden analysis through Fisher's exact tests identified a significant enrichment of damaging missense mutations in CHD cases, primarily affecting early embryonic programs such as pattern specification and heart morphogenesis. In contrast, blood transcriptomics highlighted systemic functional shifts, specifically the suppression of mitochondrial oxidative phosphorylation and activation of interferon-mediated immune responses, reflecting downstream perturbations following developmental failure.

Crucially, multi-omics integration identified core drivers supported by multiple lines of evidence: a four-way intersection (literature, variant burden, eQTLs, and DEGs) highlighted COL6A2, PKD2, and PKD1L1, while three-way intersections identified key regulators like SALL4, GLI1, ANK3, and ALMS1. Furthermore, functional enrichment analysis specifically targeting the 626 genes overlapping between eGenes and DEGs revealed significant involvement in small GTPase-mediated signal transduction and cytoskeleton organization. These findings demonstrate that blood-based multi-omics can effectively capture cardiac-relevant regulatory signals, providing a non-invasive framework to elucidate the molecular landscape of CHD.

Keywords
Congenital heart disease, rare variants, pathogenic variants, gene burden test, eQTL analysis, risk loci
National Category
Bioinformatics and Computational Biology Medical Biotechnology (Focus on Cell Biology, (incl. Stem Cell Biology), Molecular Biology, Microbiology, Biochemistry or Biopharmacy)
Research subject
Biotechnology
Identifiers
urn:nbn:se:kth:diva-378804 (URN)
Funder
Knut and Alice Wallenberg Foundation, 72110
Note

Manuscript In preparation

QC20260330

Available from: 2026-03-27 Created: 2026-03-27 Last updated: 2026-03-30Bibliographically approved

Open Access in DiVA

Kappa(9791 kB)125 downloads
File information
File name FULLTEXT01.pdfFile size 9791 kBChecksum SHA-512
a8991e0b2027638d74729518f57ac295946b730b9dcd983a8454ee44fc6fc76d1103882458296d7bc19f604af7ced8d09e105f449fe37c6720073ce538d290b8
Type summaryMimetype application/pdf

Authority records

Song, Xiya

Search in DiVA

By author/editor
Song, Xiya
By organisation
Science for Life Laboratory, SciLifeLabSystems Biology
Bioinformatics and Computational BiologyMedical Genetics and GenomicsMedical Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 74 hits
456789107 of 17
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf