kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Methods for rapid phylogenetic inference and copy number variation detection from transcriptomics data
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-9760-9140
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Computational biology leverages biological data and mathematical modeling to gain insights into biological systems and their relationships. A key example of widely used biological data is nucleotide sequences, obtained through DNA and RNA sequencing. Recent advances in sequencing technologies make it possible to obtain single-cell level DNA and RNA sequences through rapid, cost-efficient pipelines. This high-resolution data is an opportunity for researchers to investigate complex biological features and processes such as evolutionary relationships, developmental history, somatic mutations, disease progression, and tumor heterogeneity. However, factors like technical noise and inherent biological randomness present challenges in extracting meaningful insights into the aforementioned various biological concepts. Large data sizes associated with single-cell datasets exhibit another obstacle. Therefore, an increasing need for scalable and robust computational methods emerged to fully exploit the recent expansion in both the type and quantity of sequencing data. In this thesis, we address this growing demand for advanced computational methods by proposing novel approaches for two key tasks in computational biology: phylogenetic reconstruction and copy number variation (CNV) inference. 

First, we demonstrate how mixture components in variational autoencoders (VAEs) cooperate, adapting jointly to maximize the evidence lower bound (ELBO), effectively covering the target posterior distribution, and enhancing the latent-representation capabilities, yielding better cell type classification on single-cell transcriptomics datasets. Second, we introduce a VAE-based approach for copy number variation inference from single-cell transcriptomics data. Unlike previous methods, our method does not need cell-type specific gene signatures, tumor-specific markers, or any form of prior information, yet it delivers more accurate estimates of copy number variations. Third, we propose a scalable and rapid method for phylogeny reconstruction using a sparse distance matrix, significantly reducing runtime for large datasets. Fourth, we present a deep learning-based method for simultaneous clonal deconvolution and copy number variation inference from spatial transcriptomics data, offering a detailed view of intra-tumor heterogeneity.

Abstract [sv]

Beräkningsbiologi utnyttjar biologiska data och matematisk modellering för att få insikter i biologiska system och deras relationer. Ett nyckelexempel på allmänt använda biologiska data är nukleotidsekvenser, erhållna genom DNA- och RNA-sekvensering. De senaste framstegen inom sekvenseringsteknologier gör det möjligt att erhålla DNA- och RNA-sekvenser på encellsnivå genom snabba, kostnadseffektiva pipelines. Dessa högupplösta data är en möjlighet för forskare att undersöka komplexa biologiska egenskaper och processer som evolutionära samband, utvecklingshistoria, somatiska mutationer, sjukdomsprogression och tumörheterogenitet. Faktorer som tekniskt brus och inneboende biologisk slumpmässighet innebär dock utmaningar när det gäller att extrahera meningsfulla insikter i de tidigare nämnda olika biologiska begreppen. De stora datastorlekarna som är associerade med encellsdatauppsättningar uppvisar ett annat hinder. Därför uppstod ett ökande behov av skalbara och robusta beräkningsmetoder för att fullt ut utnyttja den senaste expansionen av både typer och kvantitet av sekvenseringsdata. I denna avhandling tar vi upp denna växande efterfrågan på avancerade beräkningsmetoder genom att föreslå nya tillvägagångssätt för två nyckeluppgifter inom beräkningsbiologi: fylogenetisk rekonstruktion och slutledning av kopietalsvariation (CNV).

Först visar vi hur blandningskomponenter i variationsautokodare (VAE) samarbetar, anpassar sig gemensamt för att maximera evidensens nedre gräns (ELBO), effektivt täcker den bakre målfördelningen och förbättrar förmågan till latent representation, vilket ger bättre celltypsklassificering på singel- datauppsättningar för celltranskriptomik. För det andra introducerar vi ett VAE-baserat tillvägagångssätt för slutledning av kopienummervariationer från encells transkriptomikdata. Till skillnad från tidigare metoder behöver vår metod inte celltypsspecifika gensignaturer, tumörspecifika markörer eller någon form av tidigare information, men den ger mer exakta uppskattningar av variationer i antal kopior. För det tredje föreslår vi en skalbar och snabb metod för fylogenrekonstruktion med hjälp av en gles avståndsmatris, vilket avsevärt minskar körtiden för stora datamängder. För det fjärde presenterar vi en djupinlärningsbaserad metod för samtidig klonal dekonvolution och slutledning av kopienummervariation från rumslig transkriptomikdata, vilket ger en detaljerad bild av intratumörheterogenitet.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. , p. 37
Series
TRITA-EECS-AVL ; 2024:93
Keywords [en]
Variational autoencoder, copy number variation, phylogeny, transcriptomics
Keywords [sv]
Variationsautokodare, kopienummervariation, fylogeni, transkriptomik
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-356909ISBN: 978-91-8106-131-4 (print)OAI: oai:DiVA.org:kth-356909DiVA, id: diva2:1916438
Public defence
2024-12-20, https://kth-se.zoom.us/j/68990648769, F3, KTH, Lindstedtsvägen 26 & 28, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20241129

Available from: 2024-11-29 Created: 2024-11-27 Last updated: 2025-12-03Bibliographically approved
List of papers
1. Cooperation in the Latent Space: The Benefits of Adding Mixture Components in Variational Autoencoders
Open this publication in new window or tab >>Cooperation in the Latent Space: The Benefits of Adding Mixture Components in Variational Autoencoders
Show others...
2023 (English)In: Proceedings of the 40th International Conference on Machine Learning, ICML 2023, ML Research Press , 2023, p. 18008-18022Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we show how the mixture components cooperate when they jointly adapt to maximize the ELBO. We build upon recent advances in the multiple and adaptive importance sampling literature. We then model the mixture components using separate encoder networks and show empirically that the ELBO is monotonically non-decreasing as a function of the number of mixture components. These results hold for a range of different VAE architectures on the MNIST, FashionMNIST, and CIFAR-10 datasets. In this work, we also demonstrate that increasing the number of mixture components improves the latent-representation capabilities of the VAE on both image and single-cell datasets. This cooperative behavior motivates that using Mixture VAEs should be considered a standard approach for obtaining more flexible variational approximations. Finally, Mixture VAEs are here, for the first time, compared and combined with normalizing flows, hierarchical models and/or the VampPrior in an extensive ablation study. Multiple of our Mixture VAEs achieve state-of-the-art log-likelihood results for VAE architectures on the MNIST and FashionMNIST datasets. The experiments are reproducible using our code, provided here.

Place, publisher, year, edition, pages
ML Research Press, 2023
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-350016 (URN)2-s2.0-85174402670 (Scopus ID)
Conference
40th International Conference on Machine Learning, ICML 2023, Honolulu, United States of America, Jul 23 2023 - Jul 29 2023
Note

QC 20240704

Available from: 2024-07-04 Created: 2024-07-04 Last updated: 2025-05-20Bibliographically approved
2. CopyVAE: a variational autoencoder-based approach for copy number variation inference using single-cell transcriptomics
Open this publication in new window or tab >>CopyVAE: a variational autoencoder-based approach for copy number variation inference using single-cell transcriptomics
Show others...
2024 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 40, no 5, article id btae284Article in journal (Refereed) Published
Abstract [en]

Motivation: Copy number variations (CNVs) are common genetic alterations in tumour cells. The delineation of CNVs holds promise for enhancing our comprehension of cancer progression. Moreover, accurate inference of CNVs from single-cell sequencing data is essential for unravelling intratumoral heterogeneity. However, existing inference methods face limitations in resolution and sensitivity. Results: To address these challenges, we present CopyVAE, a deep learning framework based on a variational autoencoder architecture. Through experiments, we demonstrated that CopyVAE can accurately and reliably detect CNVs from data obtained using single-cell RNA sequencing. CopyVAE surpasses existing methods in terms of sensitivity and specificity. We also discussed CopyVAE’s potential to advance our understanding of genetic alterations and their impact on disease advancement. Availability and implementation: CopyVAE is implemented and freely available under MIT license at https://github.com/kurtsemih/copyVAE.

Place, publisher, year, edition, pages
Oxford University Press, 2024
National Category
Biological Sciences
Identifiers
urn:nbn:se:kth:diva-346814 (URN)10.1093/bioinformatics/btae284 (DOI)001217927500002 ()38676578 (PubMedID)2-s2.0-85192946770 (Scopus ID)
Note

QC 20240524

Available from: 2024-05-24 Created: 2024-05-24 Last updated: 2024-11-27Bibliographically approved
3. Sparse Neighbor Joining: rapid phylogenetic inference using a sparse distance matrix
Open this publication in new window or tab >>Sparse Neighbor Joining: rapid phylogenetic inference using a sparse distance matrix
2024 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 40, no 12, article id btae701Article in journal (Other academic) Published
Abstract [en]

Phylogenetic reconstruction is a fundamental problem in computational biology. The Neighbor Joining (NJ) algorithm offers an efficient distance-based solution to this problem, which often serves as the foundation for more advanced statistical methods. Despite prior efforts to enhance the speed of NJ, the computation of the n^2 entries of the distance matrix, where n is the number of phylogenetic tree leaves, continues to pose a limitation in scaling NJ to larger datasets. In this work, we propose a new algorithm which does not require computing a dense distance matrix. Instead, it dynamically determines a sparse set of at most O(n log n) distance matrix entries to be computed in its basic version, and up to O(n log^2 n) entries in an enhanced version. We show by experiments that this approach reduces the execution time of NJ for large datasets, with a trade-off in accuracy. 

Place, publisher, year, edition, pages
Oxford University Press (OUP), 2024
Keywords
phylogeny, distance-based, neighbor joining
National Category
Bioinformatics (Computational Biology)
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-356906 (URN)10.1093/bioinformatics/btae701 (DOI)001375315000001 ()39570613 (PubMedID)2-s2.0-85212713902 (Scopus ID)
Note

QC 20241129

Available from: 2024-11-27 Created: 2024-11-27 Last updated: 2025-01-20Bibliographically approved
4. decoST: clonal deconvolution and copy number variation inference from spatial transcriptomics
Open this publication in new window or tab >>decoST: clonal deconvolution and copy number variation inference from spatial transcriptomics
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Intra-tumor heterogeneity driven by somatic copy number variations (CNVs) is a prevalent feature in human cancers. Accurately mapping this heterogeneity and the underlying CNVs using spatial transcriptomics data offers significant potential for advancing our understanding of cancer progression. However, current clonal inference methods are limited in their resolution and sensitivity, restricting their ability to fully capture the complexity of tumor heterogeneity. To address these challenges, we introduce decoST, a deep learning-based approach for clonal deconvolution and copy number variation inference using spatial and single-cell transcriptomics. Through experiments, we demonstrated that decoST can accurately identify clones in a tumor tissue, effectively mapping their spatial distributions and inferring associated copy number profiles. Additionally, we discussed decoST’s versatility, highlighting its potential applications across various cancer types, datasets, and spatial transcriptomic technologies.

National Category
Bioinformatics (Computational Biology)
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-356908 (URN)
Note

QC 20241129

Available from: 2024-11-27 Created: 2024-11-27 Last updated: 2024-11-29Bibliographically approved

Open Access in DiVA

summary(1225 kB)196 downloads
File information
File name SUMMARY01.pdfFile size 1225 kBChecksum SHA-512
6133dfb288614f24482e23e480d936c3a0a56312496cd0a2a4e5bcb7997fa5bfd356b0042c419ff3a99c47ecf06eddde3b46cab1dd0d266e84fb4987b66bd4c6
Type fulltextMimetype application/pdf

Authority records

Kurt, Semih

Search in DiVA

By author/editor
Kurt, Semih
By organisation
Computational Science and Technology (CST)Science for Life Laboratory, SciLifeLab
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 0 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1888 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf