kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
CopyVAE: a variational autoencoder-based approach for copy number variation inference using single-cell transcriptomics
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-9760-9140
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0003-1880-1730
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-5525-4724
Department of Oncology and Pathology, Karolinska Institutet, Solna, 171 77, Sweden.
Show others and affiliations
2024 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 40, no 5, article id btae284Article in journal (Refereed) Published
Abstract [en]

Motivation: Copy number variations (CNVs) are common genetic alterations in tumour cells. The delineation of CNVs holds promise for enhancing our comprehension of cancer progression. Moreover, accurate inference of CNVs from single-cell sequencing data is essential for unravelling intratumoral heterogeneity. However, existing inference methods face limitations in resolution and sensitivity. Results: To address these challenges, we present CopyVAE, a deep learning framework based on a variational autoencoder architecture. Through experiments, we demonstrated that CopyVAE can accurately and reliably detect CNVs from data obtained using single-cell RNA sequencing. CopyVAE surpasses existing methods in terms of sensitivity and specificity. We also discussed CopyVAE’s potential to advance our understanding of genetic alterations and their impact on disease advancement. Availability and implementation: CopyVAE is implemented and freely available under MIT license at https://github.com/kurtsemih/copyVAE.

Place, publisher, year, edition, pages
Oxford University Press , 2024. Vol. 40, no 5, article id btae284
National Category
Biological Sciences
Identifiers
URN: urn:nbn:se:kth:diva-346814DOI: 10.1093/bioinformatics/btae284ISI: 001217927500002PubMedID: 38676578Scopus ID: 2-s2.0-85192946770OAI: oai:DiVA.org:kth-346814DiVA, id: diva2:1860428
Note

QC 20240524

Available from: 2024-05-24 Created: 2024-05-24 Last updated: 2024-11-27Bibliographically approved
In thesis
1. Methods for rapid phylogenetic inference and copy number variation detection from transcriptomics data
Open this publication in new window or tab >>Methods for rapid phylogenetic inference and copy number variation detection from transcriptomics data
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Computational biology leverages biological data and mathematical modeling to gain insights into biological systems and their relationships. A key example of widely used biological data is nucleotide sequences, obtained through DNA and RNA sequencing. Recent advances in sequencing technologies make it possible to obtain single-cell level DNA and RNA sequences through rapid, cost-efficient pipelines. This high-resolution data is an opportunity for researchers to investigate complex biological features and processes such as evolutionary relationships, developmental history, somatic mutations, disease progression, and tumor heterogeneity. However, factors like technical noise and inherent biological randomness present challenges in extracting meaningful insights into the aforementioned various biological concepts. Large data sizes associated with single-cell datasets exhibit another obstacle. Therefore, an increasing need for scalable and robust computational methods emerged to fully exploit the recent expansion in both the type and quantity of sequencing data. In this thesis, we address this growing demand for advanced computational methods by proposing novel approaches for two key tasks in computational biology: phylogenetic reconstruction and copy number variation (CNV) inference. 

First, we demonstrate how mixture components in variational autoencoders (VAEs) cooperate, adapting jointly to maximize the evidence lower bound (ELBO), effectively covering the target posterior distribution, and enhancing the latent-representation capabilities, yielding better cell type classification on single-cell transcriptomics datasets. Second, we introduce a VAE-based approach for copy number variation inference from single-cell transcriptomics data. Unlike previous methods, our method does not need cell-type specific gene signatures, tumor-specific markers, or any form of prior information, yet it delivers more accurate estimates of copy number variations. Third, we propose a scalable and rapid method for phylogeny reconstruction using a sparse distance matrix, significantly reducing runtime for large datasets. Fourth, we present a deep learning-based method for simultaneous clonal deconvolution and copy number variation inference from spatial transcriptomics data, offering a detailed view of intra-tumor heterogeneity.

Abstract [sv]

Beräkningsbiologi utnyttjar biologiska data och matematisk modellering för att få insikter i biologiska system och deras relationer. Ett nyckelexempel på allmänt använda biologiska data är nukleotidsekvenser, erhållna genom DNA- och RNA-sekvensering. De senaste framstegen inom sekvenseringsteknologier gör det möjligt att erhålla DNA- och RNA-sekvenser på encellsnivå genom snabba, kostnadseffektiva pipelines. Dessa högupplösta data är en möjlighet för forskare att undersöka komplexa biologiska egenskaper och processer som evolutionära samband, utvecklingshistoria, somatiska mutationer, sjukdomsprogression och tumörheterogenitet. Faktorer som tekniskt brus och inneboende biologisk slumpmässighet innebär dock utmaningar när det gäller att extrahera meningsfulla insikter i de tidigare nämnda olika biologiska begreppen. De stora datastorlekarna som är associerade med encellsdatauppsättningar uppvisar ett annat hinder. Därför uppstod ett ökande behov av skalbara och robusta beräkningsmetoder för att fullt ut utnyttja den senaste expansionen av både typer och kvantitet av sekvenseringsdata. I denna avhandling tar vi upp denna växande efterfrågan på avancerade beräkningsmetoder genom att föreslå nya tillvägagångssätt för två nyckeluppgifter inom beräkningsbiologi: fylogenetisk rekonstruktion och slutledning av kopietalsvariation (CNV).

Först visar vi hur blandningskomponenter i variationsautokodare (VAE) samarbetar, anpassar sig gemensamt för att maximera evidensens nedre gräns (ELBO), effektivt täcker den bakre målfördelningen och förbättrar förmågan till latent representation, vilket ger bättre celltypsklassificering på singel- datauppsättningar för celltranskriptomik. För det andra introducerar vi ett VAE-baserat tillvägagångssätt för slutledning av kopienummervariationer från encells transkriptomikdata. Till skillnad från tidigare metoder behöver vår metod inte celltypsspecifika gensignaturer, tumörspecifika markörer eller någon form av tidigare information, men den ger mer exakta uppskattningar av variationer i antal kopior. För det tredje föreslår vi en skalbar och snabb metod för fylogenrekonstruktion med hjälp av en gles avståndsmatris, vilket avsevärt minskar körtiden för stora datamängder. För det fjärde presenterar vi en djupinlärningsbaserad metod för samtidig klonal dekonvolution och slutledning av kopienummervariation från rumslig transkriptomikdata, vilket ger en detaljerad bild av intratumörheterogenitet.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. p. 37
Series
TRITA-EECS-AVL ; 2024:93
Keywords
Variational autoencoder, copy number variation, phylogeny, transcriptomics, Variationsautokodare, kopienummervariation, fylogeni, transkriptomik
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-356909 (URN)978-91-8106-131-4 (ISBN)
Public defence
2024-12-20, https://kth-se.zoom.us/j/68990648769, F3, KTH, Lindstedtsvägen 26 & 28, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20241129

Available from: 2024-11-29 Created: 2024-11-27 Last updated: 2024-12-03Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Kurt, SemihChen, MandiToosi, HoseinLagergren, Jens

Search in DiVA

By author/editor
Kurt, SemihChen, MandiToosi, HoseinLagergren, Jens
By organisation
Computational Science and Technology (CST)Science for Life Laboratory, SciLifeLab
In the same journal
Bioinformatics
Biological Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 66 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf