Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Fast Computation of Distance Estimators
KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
2007 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 8, 89- p.Article in journal (Refereed) Published
Abstract [en]

Background: Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of n taxa in time O(n3). Unfortunately, the fastest practical algorithms known for Computing the distance matrix, from n sequences of length l, takes time proportional to l·n2. Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications. Results: We give an advanced algorithm for Computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity Symbols. This new method is shown to be more accurate as well as faster than earlier methods. Conclusion: Our novel algorithm for Computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds.

Place, publisher, year, edition, pages
2007. Vol. 8, 89- p.
Keyword [en]
DNA, SUBSTITUTIONS, REGION
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-6355DOI: 10.1186//1471-2105-8-89ISI: 000245188700001Scopus ID: 2-s2.0-33947657892OAI: oai:DiVA.org:kth-6355DiVA: diva2:11044
Note
QC 20100914. Uppdaterad från Manuskript till Artikel (20100914)Available from: 2006-11-15 Created: 2006-11-15 Last updated: 2010-09-14Bibliographically approved
In thesis
1. Computational problems in evolution: Multiple alignment, genome rearrangements, and tree reconstruction
Open this publication in new window or tab >>Computational problems in evolution: Multiple alignment, genome rearrangements, and tree reconstruction
2006 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

Reconstructing the evolutionary history of a set of species is a fundamental problem in biology. This thesis concerns computational problems that arise in different settings and stages of phylogenetic tree reconstruction, but also in other contexts. The contributions include:

• A new distance-based tree reconstruction method with optimal reconstruction radius and optimal runtime complexity. Included in the result is a greatly simplified proof that the NJ algorithm also has optimal reconstruction radius. (co-author Jens Lagergren)

• NP-hardness results for the most common variations of Multiple Alignment. In particular, it is shown that SP-score, Star Alignment, and Tree Alignment, are NP hard for all metric symbol distances over all binary or larger alphabets.

• A 1.375-approximation algorithm for Sorting By Transpositions (SBT). SBT is the problem of sorting a permutation using as few block-transpositions as possible. The complexity of this problem is still open and it was a ten-year-old open problem to improve the best known 1.5-approximation ratio. The 1.375-approximation algorithm is based on a new upper bound on the diameter of 3-permutations. Moreover, a new lower bound on the transposition diameter of the symmetric group is presented and the exact transposition diameter of simple permutations is determined. (co-author Tzvika Hartman)

• Approximation, fixed-parameter tractable, and fast heuristic algorithms for two variants of the Ancestral Maximum Likelihood (AML) problem: when the phylogenetic tree is known and when it is unknown. AML is the problem of reconstructing the most likely genetic sequences of extinct ancestors along with the most likely mutation probabilities on the edges, given the phylogenetic tree and sequences at the leafs. (co-author Tamir Tuller)

• An algorithm for computing the number of mutational events between aligned DNA sequences which is several hundred times faster than the famous Phylip packages. Since pairwise distance estimation is a bottleneck in distance-based phylogeny reconstruction, the new algorithm improves the overall running time of many distancebased methods by a factor of several hundred. (co-author Jens Lagergren)

Place, publisher, year, edition, pages
Stockholm: KTH, 2006. iv, 54 p.
Series
Trita-CSC-A, ISSN 1653-5723 ; 06/22--SE
Keyword
Distance Methods, Tree reconstruction, Sorting by Transpositions, multiple alignment, ancestral sequences
National Category
Computer Science
Identifiers
urn:nbn:se:kth:diva-4170 (URN)978-91-7178-511-4 (ISBN)
Public defence
2007-01-15, FA32, Albanova, Roslagstullsbacken 21, Stockholm, 10:00
Opponent
Supervisors
Note
QC 20110121Available from: 2006-11-15 Created: 2006-11-15 Last updated: 2011-01-21Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Elias, IsaacLagergren, Jens
By organisation
Numerical Analysis and Computer Science, NADA
In the same journal
BMC Bioinformatics
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 70 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf