Phylogenetic Partitioning of Gene Families
(English)Manuscript (preprint) (Other academic)
Clustering and organizing molecular sequences is one of the central tasks in Bioinformatics. It is a common first step in, for example, phylogenomic analysis. For some tasks, a large gene family needs to be partitioned into more manageable subfamilies. In particular, Bayesian phylogenetic analysis can be very expensive. There is a need for easy and natural means of breaking up a gene family, with moderate computational requirements, to enable careful analysis of subfamilies with computationally expensive tools. We devised and implemented a method that infer and reconcile gene trees to species trees and identifies putative orthogroups as subfamilies. To achieve reasonable speed, approximate ML phylogenies are inferred using the FastTree method and combined with a subfamily-centered bootstrapping procedure to ensure robustness. Using the new method, very large clusters of sequences are now easier to manage in pipelines containing computationally expensive steps. The implementation of PhyloGenClust is available at a public repository, https://github.com/malagori/PhyloGenClust, under the GNU General Public License version 3.
Phylogenetic, Clustering, Gene Families
Bioinformatics (Computational Biology)
Research subject Computer Science
IdentifiersURN: urn:nbn:se:kth:diva-193634OAI: oai:DiVA.org:kth-193634DiVA: diva2:1033271
QC 201610072016-10-062016-10-062016-10-12Bibliographically approved