Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Reconstruction of Ancestral Genomic Sequences Using Likelihood
KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
2007 (English)In: Journal of Computational Biology, ISSN 1066-5277, E-ISSN 1557-8666, Vol. 14, no 2, 216-237 p.Article in journal (Refereed) Published
Abstract [en]

A challenging task in computational biology is the reconstruction of genomic sequences of extinct ancestors, given the phylogenetic tree and the sequences at the leafs. This task is best solved by calculating the most likely estimate of the ancestral sequences, along with the most likely edge lengths. We deal with this problem and also the variant in which the phylogenetic tree in addition to the ancestral sequences need to be estimated. The latter problem is known to be NP-hard, while the computational complexity of the former is unknown. Currently, all algorithms for solving these problems are heuristics without performance guarantees. The biological importance of these problems calls for developing better algorithms with guarantees of finding either optimal or approximate solutions. We develop approximation, fix parameter tractable ( FPT), and fast heuristic algorithms for two variants of the problem; when the phylogenetic tree is known and when it is unknown. The approximation algorithm guarantees a solution with a log- likelihood ratio of 2 relative to the optimal solution. The FPT has a running time which is polynomial in the length of the sequences and exponential in the number of taxa. This makes it useful for calculating the optimal solution for small trees. Moreover, we combine the approximation algorithm and the FPT into an algorithm with arbitrary good approximation guarantee ( PTAS). We tested our algorithms on both synthetic and biological data. In particular, we used the FPT for computing the most likely ancestral mitochondrial genomes of hominidae ( the great apes), thereby answering an interesting biological question. Moreover, we show how the approximation algorithms find good solutions for reconstructing the ancestral genomes for a set of lentiviruses ( relatives of HIV). Supplementary material of this work is available at www.nada.kth.se/(similar to)isaac/publications/aml/aml.html.

Place, publisher, year, edition, pages
2007. Vol. 14, no 2, 216-237 p.
Keyword [en]
ancestral maximum likelihood, most parsimonious likelihood, PTAS, FPT, 2-approximation
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-6356DOI: 10.1089/cmb.2006.0101ISI: 000245943700006Scopus ID: 2-s2.0-34248184703OAI: oai:DiVA.org:kth-6356DiVA: diva2:11045
Note

QC 20110121. Uppdaterad från manuskript till artikel

Available from: 2006-11-15 Created: 2006-11-15 Last updated: 2012-09-26Bibliographically approved
In thesis
1. Computational problems in evolution: Multiple alignment, genome rearrangements, and tree reconstruction
Open this publication in new window or tab >>Computational problems in evolution: Multiple alignment, genome rearrangements, and tree reconstruction
2006 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

Reconstructing the evolutionary history of a set of species is a fundamental problem in biology. This thesis concerns computational problems that arise in different settings and stages of phylogenetic tree reconstruction, but also in other contexts. The contributions include:

• A new distance-based tree reconstruction method with optimal reconstruction radius and optimal runtime complexity. Included in the result is a greatly simplified proof that the NJ algorithm also has optimal reconstruction radius. (co-author Jens Lagergren)

• NP-hardness results for the most common variations of Multiple Alignment. In particular, it is shown that SP-score, Star Alignment, and Tree Alignment, are NP hard for all metric symbol distances over all binary or larger alphabets.

• A 1.375-approximation algorithm for Sorting By Transpositions (SBT). SBT is the problem of sorting a permutation using as few block-transpositions as possible. The complexity of this problem is still open and it was a ten-year-old open problem to improve the best known 1.5-approximation ratio. The 1.375-approximation algorithm is based on a new upper bound on the diameter of 3-permutations. Moreover, a new lower bound on the transposition diameter of the symmetric group is presented and the exact transposition diameter of simple permutations is determined. (co-author Tzvika Hartman)

• Approximation, fixed-parameter tractable, and fast heuristic algorithms for two variants of the Ancestral Maximum Likelihood (AML) problem: when the phylogenetic tree is known and when it is unknown. AML is the problem of reconstructing the most likely genetic sequences of extinct ancestors along with the most likely mutation probabilities on the edges, given the phylogenetic tree and sequences at the leafs. (co-author Tamir Tuller)

• An algorithm for computing the number of mutational events between aligned DNA sequences which is several hundred times faster than the famous Phylip packages. Since pairwise distance estimation is a bottleneck in distance-based phylogeny reconstruction, the new algorithm improves the overall running time of many distancebased methods by a factor of several hundred. (co-author Jens Lagergren)

Place, publisher, year, edition, pages
Stockholm: KTH, 2006. iv, 54 p.
Series
Trita-CSC-A, ISSN 1653-5723 ; 06/22--SE
Keyword
Distance Methods, Tree reconstruction, Sorting by Transpositions, multiple alignment, ancestral sequences
National Category
Computer Science
Identifiers
urn:nbn:se:kth:diva-4170 (URN)978-91-7178-511-4 (ISBN)
Public defence
2007-01-15, FA32, Albanova, Roslagstullsbacken 21, Stockholm, 10:00
Opponent
Supervisors
Note
QC 20110121Available from: 2006-11-15 Created: 2006-11-15 Last updated: 2011-01-21Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Elias, Isaac
By organisation
Numerical Analysis and Computer Science, NADA
In the same journal
Journal of Computational Biology
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 62 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf