Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Birth-death prior on phylogeny and speed dating
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.ORCID iD: 0000-0002-5896-473X
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
2008 (English)In: BMC Evolutionary Biology, ISSN 1471-2148, E-ISSN 1471-2148, Vol. 8, no 1, p. 77-Article in journal (Refereed) Published
Abstract [en]

Background: In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies.

Results: We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on.

Conclusion: Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models.

Place, publisher, year, edition, pages
2008. Vol. 8, no 1, p. 77-
Keywords [en]
CHAIN MONTE-CARLO, ESTIMATING DIVERGENCE TIMES, MOLECULAR CLOCK, LIKELIHOOD APPROACH, EVOLUTIONARY TREES, MAXIMUM-LIKELIHOOD, DNA-SEQUENCES, DATES, INFERENCE, PROBABILITY
National Category
Biological Sciences
Identifiers
URN: urn:nbn:se:kth:diva-8463DOI: 10.1186/1471-2148-8-77ISI: 000254282900001Scopus ID: 2-s2.0-41149156444OAI: oai:DiVA.org:kth-8463DiVA, id: diva2:13793
Note
QC 20100901Available from: 2008-05-16 Created: 2008-05-16 Last updated: 2017-12-14Bibliographically approved
In thesis
1. Taking advantage of phylogenetic trees in comparative genomics
Open this publication in new window or tab >>Taking advantage of phylogenetic trees in comparative genomics
2008 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

Phylogenomics can be regarded as evolution and genomics in co-operation. Various kinds of evolutionary studies, gene family analysis among them, demand access to genome-scale datasets. But it is also clear that many genomics studies, such as assignment of gene function, are much improved by evolutionary analysis. The work leading to this thesis is a contribution to the phylogenomics field. We have used phylogenetic relationships between species in genome-scale searches for two intriguing genomic features, namely and A-to-I RNA editing. In the first case we used pairwise species comparisons, specifically human-mouse and human-chimpanzee, to infer existence of functional mammalian pseudogenes. In the second case we profited upon later years' rapid growth of the number of sequenced genomes, and used 17-species multiple sequence alignments. In both these studies we have used non-genomic data, gene expression data and synteny relations among these, to verify predictions. In the A-to-I editing project we used 454 sequencing for experimental verification.

We have further contributed a maximum a posteriori (MAP) method for fast and accurate dating analysis of speciations and other evolutionary events. This work follows recent years' trend of leaving the strict molecular clock when performing phylogenetic inference. We discretised the time interval from the leaves to the root in the tree, and used a dynamic programming (DP) algorithm to optimally factorise branch lengths into substitution rates and divergence times. We analysed two biological datasets and compared our results with recent MCMC-based methodologies. The dating point estimates that our method delivers were found to be of high quality while the gain in speed was dramatic.

Finally we applied the DP strategy in a new setting. This time we used a grid laid out on a species tree instead of on an interval. The discretisation gives together with speciation times a common timeframe for a gene tree and the corresponding species tree. This is the key to integration of the sequence evolution process and the gene evolution process. Out of several potential application areas we chose gene tree reconstruction. We performed genome-wide analysis of yeast gene families and found that our methodology performs very well.

Place, publisher, year, edition, pages
Stockholm: KTH, 2008. p. 53
Series
Trita-CSC-A, ISSN 1653-5723 ; 2008:09
Keywords
Computer Science
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-4757 (URN)978-91-7178-987-7 (ISBN)
Public defence
2008-06-04, FD05, Albanova, Roslagstullsbacken 21, Stockholm, 09:30
Opponent
Supervisors
Note
QC 20100923Available from: 2008-05-16 Created: 2008-05-16 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Åkerborg, ÖrjanLagergren, Jens
By organisation
Computational Biology, CB
In the same journal
BMC Evolutionary Biology
Biological Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 95 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf