Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using Trees to Capture Reticulate Evolution: Lateral Gene Transfers and Cancer Progression
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
2009 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The historic relationship of species and genes are traditionally depicted using trees. However, not all evolutionary histories are adequately captured by bifurcating processes and an increasing amount of research is devoted towards using networks or network-like structures to capture evolutionary history. Lateral gene transfer (LGT) is a previously controversial mechanism responsible for non tree-like evolutionary histories, and is today accepted as a major force of evolution, particularly in the prokaryotic domain.

In this thesis, we present models of gene evolution incorporating both LGTs and duplications, together with efficient computational methods for various inference problems. Specifically, we define a biologically sound combinatorial model for reconciliation of species and gene trees that facilitates simultaneous consideration of duplications and LGTs. We prove that finding most parsimonious reconciliations is NP-hard, but that the problem can be solved efficiently if reconciliations are not required to be acyclic—a condition that is satisfied when analyzing most real-world datasets. We also provide a polynomial-time algorithm for parametric tree reconciliation, a problem analogous to parametric sequence alignment, that enables us to study the entire space of optimal reconciliations under all possible cost schemes.

Going beyond combinatorial models, we define the first probabilistic model of gene evolution incorporating a birth-death process generating duplications, LGTs, and losses, together with a relaxed molecular clock model of sequence evolution. Algorithms based on Markov chain Monte Carlo (MCMC) techniques, methods from numerical analysis, and dynamic programming are presented for various probability and parameter inference problems.

Finally, we develop methods for analysis of cancer progression, a biological process with many similarities to the process of evolution. Cancer progresses by accumulation of harmful genetic aberrations whose patterns of emergence are graph-like. We develop a model of cancer progression based on trees, and mixtures thereof, that admits an efficient structural EM algorithm for finding Maximum Likelihood (ML) solutions from available cross-sectional data.

Place, publisher, year, edition, pages
Stockholm: KTH , 2009. , viii, 68 p.
Series
Trita-CSC-A, ISSN 1653-5723 ; 2009:10
Keyword [en]
Lateral Gene Tranfer, Horizontal Gene Transfer, Cancer Progression
National Category
Bioinformatics and Systems Biology Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-10608ISBN: 978-91-7415-349-1 (print)OAI: oai:DiVA.org:kth-10608DiVA: diva2:220830
Public defence
2009-06-12, Svedbergssalen, Albanova, Roslagstullsbacken 21, Stockholm, 10:00 (English)
Opponent
Supervisors
Note
QC 20100812Available from: 2009-06-04 Created: 2009-06-02 Last updated: 2010-08-12Bibliographically approved
List of papers
1. Simultaneous Identification of Duplications and Lateral Gene Transfers
Open this publication in new window or tab >>Simultaneous Identification of Duplications and Lateral Gene Transfers
2011 (English)In: IEEE/ACM Transactions on Computational Biology & Bioinformatics, ISSN 1545-5963, E-ISSN 1557-9964, Vol. 8, no 2, 517-535 p.Article in journal (Refereed) Published
Abstract [en]

The incongruency between a gene tree and a corresponding species tree can be attributed to evolutionary events such as gene duplication and gene loss. This paper describes a combinatorial model where a so-called DTL-scenario is used to explain the differences between a gene tree anda corresponding species tree taking into account gene duplications, gene losses, and lateral genetransfers (also known as horizontal gene transfers). The reasonable biological constraint that a lateralgene transfer may only occur between contemporary species leads to the notion of acyclic DTLscenarios.Parsimony methods are introduced by defining appropriate optimization problems. Weshow that finding most parsimonious acyclic DTL-scenarios is NP-complete. However, by droppingthe condition of acyclicity, the problem becomes tractable, and we provide a dynamic programmingalgorithm as well as a fixed-parameter-tractable algorithm for finding most parsimonious DTLscenarios.

Keyword
Trees, Biology and genetics, Combinatorial algorithms, Graph algorithms
National Category
Bioinformatics and Systems Biology Computer Science
Identifiers
urn:nbn:se:kth:diva-10969 (URN)10.1109/TCBB.2010.14 (DOI)000286146600021 ()21233529 (PubMedID)2-s2.0-79551667938 (ScopusID)
Note
QC 20100812 Uppdaterad från submitted till published(20110301)Available from: 2009-09-01 Created: 2009-09-01 Last updated: 2011-03-01Bibliographically approved
2. Inferring Duplications and Lateral Gene Transfers: An Algorithm for Parametric Tree Reconciliation
Open this publication in new window or tab >>Inferring Duplications and Lateral Gene Transfers: An Algorithm for Parametric Tree Reconciliation
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Prediction of the function of genes and their products is an increasingly important computational problem. The ability to correctly identify the historic relationship of homologous genes is essential for making accurate predictions.In 1970, Fitch made a distinction between paralogous and orthologous genes, its importance lying in the observation that genes are more likely to have similar functions when they have evolved from a common ancestral gene through speciation rather than duplication. Lateral gene transfer (LGT) is yet another important evolutionary event that creates copies of genes, and asour understanding of the importance and prevalence of LGT in evolution is deepening, there is a high demand for methods for detection of LGTs when reconstructing the evolutionary past of genes.

In this paper, we present highly efficient and practical algorithms for treereconciliation that simultaneously consider both duplications and LGTs. Weallow costs to be associated with duplications and LGTs and develop methods for finding reconciliations of minimal total cost between species trees andgene trees. Moreover, we provide an efficient algorithm for parametric treereconciliation—a computational problem analogous to parametric sequencealignment. Experimental results on synthetic data indicate that our methodsare robust with high specificity and sensitivity.

National Category
Bioinformatics and Systems Biology Computer Science
Identifiers
urn:nbn:se:kth:diva-10970 (URN)
Note
QC 20100812Available from: 2009-09-01 Created: 2009-09-01 Last updated: 2012-05-02Bibliographically approved
3. Detecting LGTs using a novel probabilistic modelintegrating duplications, LGTs, losses, rate variation,and sequence evolution
Open this publication in new window or tab >>Detecting LGTs using a novel probabilistic modelintegrating duplications, LGTs, losses, rate variation,and sequence evolution
Show others...
2009 (English)Manuscript (preprint) (Other academic)
Abstract [en]

The debate over the prevalence of lateral gene transfers (LGTs) has been intense.There is now to a large extent consensus around the view that LGT is an important evolutionary force as well as regarding its relative importance across species. This consensus relies, however, mainly on studies of individual gene families.

Up until now, the gold standard for identifying LGTs has been phylogenetic methods where LGTs are inferred from incongruities between a species tree andan associated gene tree. Even in cases where there is evidence of LGT, several concerns have often been raised regarding the significance of the evidence. One common concern has been the possibility that other evolutionary events have caused the incongruities. Another has been the significance of the gene trees involved in the inference; there may for instance be alternative, almost equally likely, gene trees that do not provide evidence for LGT. Independently of these concerns, there has been a need for methods that can be used to quantitatively characterize the level of LGT among sets of species, but also for methods able to pinpoint where in the species tree LGTs have occurred.

Here, we provide the first probabilistic model capturing gene duplication, LGT,gene loss, and point mutations with a relaxed molecular clock. We also provide allfundamental algorithms required to analyze a gene family relative to a given speciestree under this model. Our algorithms are based on Markov chain Monte Carlo(MCMC) methodology but build also on techniques from numerical analysis and involve dynamic programming (DP).

National Category
Industrial Biotechnology Computer Science
Identifiers
urn:nbn:se:kth:diva-10971 (URN)
Note

QC 20100812

Available from: 2009-09-01 Created: 2009-09-01 Last updated: 2016-12-20Bibliographically approved
4. A global structural EM algorithm for a model of cancer progression
Open this publication in new window or tab >>A global structural EM algorithm for a model of cancer progression
2011 (English)Manuscript (preprint) (Other academic)
Abstract [en]

Cancer has complex patterns of progression that include converging as well as diverging progressional pathways. Vogelstein’s path model of colon cancer was clearly a pioneering contribution to cancer research. Since then, several attempts have been made at obtaining mathematical models of cancer progression, devising training algorithms,and applying these to cross-sectional data.Beerenwinkel et al. provided, what they coined, EM-like algorithms for OncogeneticTrees (OTs) and mixtures of such. Given the small size of current and future datasets, it is important to minimize the number of parameters of a model. For this reason,also we focus on tree-based models and introduce Hidden-variable Oncogenetic Trees(HOTs). In contrast to OTs, HOTs allow for errors in the data and thereby provide more realistic modeling. We also design global structural EM algorithms for learning HOTs and mixtures of HOTs (HOT-mixtures). The algorithms are global in the sense that, during the M-step, they find a structure that yields a global maximum of the expected complete log-likelihood rather than merely one that improves it.The algorithm for single HOTs performs very well on reasonable-sized data sets,while that for HOT-mixtures requires data sets of sizes obtainable only with tomorrows more cost efficient technologies. To facilitate analysis of complex cytogenetic data sets requiring more than one HOT, we devise a decomposition strategy based on PrincipalComponent Analysis and train parameters on a colon cancer data set. The method so obtained is then successfully applied to kidney cancer.

National Category
Bioinformatics and Systems Biology Computer Science
Identifiers
urn:nbn:se:kth:diva-10973 (URN)
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscience
Note

QC 20100812

Available from: 2009-09-01 Created: 2009-09-01 Last updated: 2016-12-16Bibliographically approved

Open Access in DiVA

fulltext(497 kB)696 downloads
File information
File name FULLTEXT02.pdfFile size 497 kBChecksum SHA-512
1ece21038321b8f46787c8f608723c10a267c67e0ef6807c392b851d933fcecc71ff163585f2c06afd1f03ca7840e592eca0e92b01f3f55345d13b3c7e61a36e
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Tofigh, Ali
By organisation
Computational Biology, CB
Bioinformatics and Systems BiologyComputer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 697 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 699 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf