Motif Yggdrasil: Sampling sequence motifs from a tree mixture model
2007 (English)In: Journal of Computational Biology, ISSN 1066-5277, E-ISSN 1557-8666, Vol. 14, no 5, 682-697 p.Article in journal (Refereed) Published
In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.
Place, publisher, year, edition, pages
2007. Vol. 14, no 5, 682-697 p.
Gibbs sampling, phylogenetic footprinting, regulatory element, transcription factor binding site identification probabilistic modeling, factor-binding sites, regulatory elements, evolution, algorithms, discovery, alignment, matrices
IdentifiersURN: urn:nbn:se:kth:diva-16779DOI: 10.1089/cmb.2007.R010ISI: 000247927100011ScopusID: 2-s2.0-34447273158OAI: oai:DiVA.org:kth-16779DiVA: diva2:334822
QC 201005252010-08-052010-08-052010-09-20Bibliographically approved