Change search
ReferencesLink to record
Permanent link

Direct link
A global structural EM algorithm for a model of cancer progression
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.ORCID iD: 0000-0002-6504-1765
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
2011 (English)Manuscript (preprint) (Other academic)
Abstract [en]

Cancer has complex patterns of progression that include converging as well as diverging progressional pathways. Vogelstein’s path model of colon cancer was clearly a pioneering contribution to cancer research. Since then, several attempts have been made at obtaining mathematical models of cancer progression, devising training algorithms,and applying these to cross-sectional data.Beerenwinkel et al. provided, what they coined, EM-like algorithms for OncogeneticTrees (OTs) and mixtures of such. Given the small size of current and future datasets, it is important to minimize the number of parameters of a model. For this reason,also we focus on tree-based models and introduce Hidden-variable Oncogenetic Trees(HOTs). In contrast to OTs, HOTs allow for errors in the data and thereby provide more realistic modeling. We also design global structural EM algorithms for learning HOTs and mixtures of HOTs (HOT-mixtures). The algorithms are global in the sense that, during the M-step, they find a structure that yields a global maximum of the expected complete log-likelihood rather than merely one that improves it.The algorithm for single HOTs performs very well on reasonable-sized data sets,while that for HOT-mixtures requires data sets of sizes obtainable only with tomorrows more cost efficient technologies. To facilitate analysis of complex cytogenetic data sets requiring more than one HOT, we devise a decomposition strategy based on PrincipalComponent Analysis and train parameters on a colon cancer data set. The method so obtained is then successfully applied to kidney cancer.

Place, publisher, year, edition, pages
National Category
Bioinformatics and Systems Biology Computer Science
URN: urn:nbn:se:kth:diva-10973OAI: diva2:233575
Science for Life Laboratory - a national resource center for high-throughput molecular bioscience

QC 20100812

Available from: 2009-09-01 Created: 2009-09-01 Last updated: 2016-02-02Bibliographically approved
In thesis
1. Using Trees to Capture Reticulate Evolution: Lateral Gene Transfers and Cancer Progression
Open this publication in new window or tab >>Using Trees to Capture Reticulate Evolution: Lateral Gene Transfers and Cancer Progression
2009 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The historic relationship of species and genes are traditionally depicted using trees. However, not all evolutionary histories are adequately captured by bifurcating processes and an increasing amount of research is devoted towards using networks or network-like structures to capture evolutionary history. Lateral gene transfer (LGT) is a previously controversial mechanism responsible for non tree-like evolutionary histories, and is today accepted as a major force of evolution, particularly in the prokaryotic domain.

In this thesis, we present models of gene evolution incorporating both LGTs and duplications, together with efficient computational methods for various inference problems. Specifically, we define a biologically sound combinatorial model for reconciliation of species and gene trees that facilitates simultaneous consideration of duplications and LGTs. We prove that finding most parsimonious reconciliations is NP-hard, but that the problem can be solved efficiently if reconciliations are not required to be acyclic—a condition that is satisfied when analyzing most real-world datasets. We also provide a polynomial-time algorithm for parametric tree reconciliation, a problem analogous to parametric sequence alignment, that enables us to study the entire space of optimal reconciliations under all possible cost schemes.

Going beyond combinatorial models, we define the first probabilistic model of gene evolution incorporating a birth-death process generating duplications, LGTs, and losses, together with a relaxed molecular clock model of sequence evolution. Algorithms based on Markov chain Monte Carlo (MCMC) techniques, methods from numerical analysis, and dynamic programming are presented for various probability and parameter inference problems.

Finally, we develop methods for analysis of cancer progression, a biological process with many similarities to the process of evolution. Cancer progresses by accumulation of harmful genetic aberrations whose patterns of emergence are graph-like. We develop a model of cancer progression based on trees, and mixtures thereof, that admits an efficient structural EM algorithm for finding Maximum Likelihood (ML) solutions from available cross-sectional data.

Place, publisher, year, edition, pages
Stockholm: KTH, 2009. viii, 68 p.
Trita-CSC-A, ISSN 1653-5723 ; 2009:10
Lateral Gene Tranfer, Horizontal Gene Transfer, Cancer Progression
National Category
Bioinformatics and Systems Biology Computer Science
urn:nbn:se:kth:diva-10608 (URN)978-91-7415-349-1 (ISBN)
Public defence
2009-06-12, Svedbergssalen, Albanova, Roslagstullsbacken 21, Stockholm, 10:00 (English)
QC 20100812Available from: 2009-06-04 Created: 2009-06-02 Last updated: 2010-08-12Bibliographically approved

Open Access in DiVA

fulltext(146 kB)384 downloads
File information
File name FULLTEXT01.pdfFile size 146 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links


Search in DiVA

By author/editor
Tofigh, AliSjölund, ErikLagergren, Jens
By organisation
Computational Biology, CB
Bioinformatics and Systems BiologyComputer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 384 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 318 hits
ReferencesLink to record
Permanent link

Direct link