Bayesian Unsupervised Learning of DNA Regulatory Binding Regions
2009 (English)In: Advances in Artificial Intelligence, ISSN 1687-7470, 219743- p.Article in journal (Refereed) Published
Identification of regulatory binding motifs, that is, short specific words, within DNA sequences is a commonly occurring problem in computational bioinformatics. A wide variety of probabilistic approaches have been proposed in the literature to either scan for previously known motif types or to attempt de novo identification of a fixed number (typically one) of putative motifs. Mostapproaches assume the existence of reliable biodatabase information to build probabilistic a priori description of the motif classes. Examples of attempts to do probabilistic unsupervised learning about the number of putative de novo motif types and theirpositions within a set of DNA sequences are very rare in the literature. Here we show how such a learning problem can be formulated using a Bayesian model that targets to simultaneously maximize the marginal likelihood of sequence data arising under multiple motif types as well as under the background DNA model, which equals a variable length Markov chain. It is demonstrated how the adopted Bayesian modelling strategy combined with recently introduced nonstandard stochastic computation tools yields a more tractable learning procedure than is possible with the standard Monte Carlo approaches. Improvements and extensions of the proposed approach are also discussed.
Place, publisher, year, edition, pages
Hindawi Publishing Corporation, 2009. 219743- p.
Identification of regulatory binding motifs
Engineering and Technology Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-83193DOI: 10.1155/2009/219743OAI: oai:DiVA.org:kth-83193DiVA: diva2:498774