Species tree inference using a mixture model
2015 (English)In: Molecular biology and evolution, ISSN 0737-4038, E-ISSN 1537-1719Article in journal (Refereed) Published
Species tree reconstruction has been a subject of substantial research due to its central role across biology and medicine. A species tree is often reconstructed using a set of gene trees or by directly using sequence data. In either of these cases, one of the main confounding phenomena is the discordance between a species tree and a gene tree due to evolutionary events such as duplications and losses. Probabilistic methods can resolve the discordance by co-estimating gene trees and the species tree but this approach poses a scalability problem for larger data sets.
We present MixTreEM-DLRS: a two-phase approach for reconstructing a species tree in the presence of gene duplications and losses. In the first phase, MixTreEM, a novel structural EM algorithm based on a mixture model is used to reconstruct a set of candidate species trees, given sequence data for monocopy gene families from the genomes under study. In the second phase, PrIME-DLRS, a method based on the DLRS model ( ̊Akerborg et al., 2009), is used for selecting the best species tree. PrIME-DLRS can handle multicopy gene families since DLRS, apart from modeling sequence evolution, models gene duplication and loss using a gene evolution model (Arvestad et al., 2009).
We evaluate MixTreEM-DLRS using synthetic and biological data, and compare its performance to a recent genome-scale species tree reconstruction method PHYLDOG (Boussau et al., 2013) as well as to a fast parsimony-based algorithm Duptree (Wehe et al., 2008). Our method is competitive with PHYLDOG in terms of accuracy and runs significantly faster and our method outperforms Duptree in accuracy. The analysis constituted by MixTreEM without DLRS may also be used for selecting the target species tree, yielding a fast and yet accurate algorithm for larger data sets. MixTreEM is freely available at http://prime.scilifelab.se/mixtreem.
Place, publisher, year, edition, pages
Oxford: Oxford University Press, 2015.
Species trees, mixture model, expectation maximization, phylogenetics, mammalian phylogeny
Bioinformatics (Computational Biology)
Research subject Computer Science; Computer Science
IdentifiersURN: urn:nbn:se:kth:diva-168166DOI: 10.1093/molbev/msv115ISI: 000361981900020OAI: oai:DiVA.org:kth-168166DiVA: diva2:814542
QC 201510202015-05-272015-05-272015-10-26Bibliographically approved