Change search
ReferencesLink to record
Permanent link

Direct link
fastphylo: Fast tools for phylogenetics
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. KTH, Centres, Science for Life Laboratory, SciLifeLab.
Stockholms universitet. (DBB)
KTH, School of Computer Science and Communication (CSC).
Show others and affiliations
2013 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 14, no 1, 334- p.Article in journal (Refereed) Published
Abstract [en]

Background: Distance methods are ubiquitous tools in phylogenetics. Their primary purpose may be to reconstruct evolutionary history, but they are also used as components in bioinformatic pipelines. However, poor computational efficiency has been a constraint on the applicability of distance methods on very large problem instances. Results: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methods and report the results in terms of speed and memory efficiency. Conclusions: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

Place, publisher, year, edition, pages
BioMed Central, 2013. Vol. 14, no 1, 334- p.
Keyword [en]
Distance matrices, Distance method, Evolutionary history, Large problems, Memory efficient, Modular architectures, Neighbor joining, Phylogenetic studies
National Category
Bioinformatics (Computational Biology)
URN: urn:nbn:se:kth:diva-136421DOI: 10.1186/1471-2105-14-334ISI: 000329901900001PubMedID: 24255987ScopusID: 2-s2.0-84887664660OAI: diva2:676061
Swedish e‐Science Research CenterScience for Life Laboratory - a national resource center for high-throughput molecular bioscience

QC 20140205

Available from: 2013-12-05 Created: 2013-12-05 Last updated: 2016-10-10Bibliographically approved
In thesis
1. Computational Problems in Modeling Evolution and Inferring Gene Families.
Open this publication in new window or tab >>Computational Problems in Modeling Evolution and Inferring Gene Families.
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Over the last few decades, phylogenetics has emerged as a very promising field, facilitating a comparative framework to explain the genetic relationships among all the living organisms on earth. These genetic relationships are typically represented by a bifurcating phylogenetic tree — the tree of life. Reconstructing a phylogenetic tree is one of the central tasks in evolutionary biology. The different evolutionary processes, such as gene duplications, gene losses, speciation, and lateral gene transfer events, make the phylogeny reconstruction task more difficult. However, with the rapid developments in sequencing technologies and availability of genome-scale sequencing data, give us the opportunity to understand these evolutionary processes in a more informed manner, and ultimately, enable us to reconstruct genes and species phylogenies more accurately. This thesis is an attempt to provide computational methods for phylogenetic inference and give tools to conduct genome-scale comparative evolutionary studies, such as detecting homologous sequences and inferring gene families.

In the first project, we present FastPhylo as a software package containing fast tools for reconstructing distance-based phylogenies. It implements the previously published efficient algorithms for estimating a distance matrix from the input sequences and reconstructing an un-rooted Neighbour Joining tree from a given distance matrix. Results on simulated datasets reveal that FastPhylo can handles hundred of thousands of sequences in a minimum time and memory efficient manner. The easy to use, well-defined interfaces, and the modular structure of FastPhylo allows it to be used in very large Bioinformatic pipelines.

In the second project, we present a synteny-aware gene homology method, called GenFamClust (GFC) that uses gene content and gene order conservation to detect homology. Results on simulated and biological datasets suggest that local synteny information combined with the sequence similarity improves the detection of homologs.

In the third project, we introduce a novel phylogeny-based clustering method, PhyloGenClust, which partitions a very large gene family into smaller subfamilies. ROC (receiver operating characteristics) analysis on synthetic datasets show that PhyloGenClust identify subfamilies more accurately. PhyloGenClust can be used as a middle tier clustering method between raw clustering methods, such as sequence similarity methods, and more sophisticated Bayesian-based phylogeny methods.

Finally, we introduce a novel probabilistic Bayesian method based on the DLTRS model, to sample reconciliations of a gene tree inside a species tree. The method uses MCMC framework to integrate LGTs, gene duplications, gene losses and sequence evolution under a relaxed molecular clock for substitution rates. The proposed sampling method estimates the posterior distribution of gene trees and provides the temporal information of LGT events over the lineages of a species tree. Analysis on simulated datasets reveal that our method performs well in identifying the true temporal estimates of LGT events. We applied our method to the genome-wide gene families for mollicutes and cyanobacteria, which gave an interesting insight into the potential LGTs highways. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2016. 57 p.
TRITA-CSC-A, ISSN 1653-5723 ; 2016:24
Evolution, Phylogenetics, Lateral Gene Transfer, Gene Families, Clustering
National Category
Bioinformatics (Computational Biology)
Research subject
Computer Science
urn:nbn:se:kth:diva-193637 (URN)978-91-7729-131-2 (ISBN)
Public defence
2016-10-18, Air, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (English)

QC 20161010

Available from: 2016-10-10 Created: 2016-10-06 Last updated: 2016-10-10Bibliographically approved

Open Access in DiVA

fulltext(459 kB)185 downloads
File information
File name FULLTEXT01.pdfFile size 459 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMedScopusBiomedcentral

Search in DiVA

By author/editor
Khan, Mehmood AlamElias, IsaacNylander, KristinaSchobesberger, RichardSchmitzberger, PeterLagergren, JensArvestad, Lars
By organisation
Computational Biology, CBScience for Life Laboratory, SciLifeLabSchool of Computer Science and Communication (CSC)
In the same journal
BMC Bioinformatics
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 185 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 77 hits
ReferencesLink to record
Permanent link

Direct link