Change search
ReferencesLink to record
Permanent link

Direct link
Sparse Markov Chains for Sequence Data. Scandinavian Journal of Statistics
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
2014 (English)In: Scandinavian Journal of Statistics, ISSN 0303-6898, E-ISSN 1467-9469, Vol. 41, no 3, 639-655 p.Article in journal (Refereed) Published
Abstract [en]

Finite memory sources and variable-length Markov chains have recently gained popularity in data compression and mining, in particular, for applications in bioinformatics and language modelling. Here, we consider denser data compression and prediction with a family of sparse Bayesian predictive models for Markov chains in finite state spaces. Our approach lumps transition probabilities into classes composed of invariant probabilities, such that the resulting models need not have a hierarchical structure as in context tree-based approaches. This can lead to a substantially higher rate of data compression, and such non-hierarchical sparse models can be motivated for instance by data dependence structures existing in the bioinformatics context. We describe a Bayesian inference algorithm for learning sparse Markov models through clustering of transition probabilities. Experiments with DNA sequence and protein data show that our approach is competitive in both prediction and classification when compared with several alternative methods on the basis of variable memory length.

Place, publisher, year, edition, pages
New York: John Wiley & Sons, Inc. , 2014. Vol. 41, no 3, 639-655 p.
Keyword [en]
Bayesian learning, predictive inference, data compression, Markov chains, variable order Markov chains
National Category
Probability Theory and Statistics
URN: urn:nbn:se:kth:diva-137075DOI: 10.1111/sjos.12053ISI: 000340493800005ScopusID: 2-s2.0-84905718060OAI: diva2:677913

QC 20140912

Available from: 2013-12-10 Created: 2013-12-10 Last updated: 2014-09-12Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus
By organisation
Mathematical Statistics
In the same journal
Scandinavian Journal of Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 36 hits
ReferencesLink to record
Permanent link

Direct link