Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Probabilistic Sequence Models with Speech and Language Applications
KTH, School of Electrical Engineering (EES), Communication Theory.
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Series data, sequences of measured values, are ubiquitous. Whenever observations are made along a path in space or time, a data sequence results. To comprehend nature and shape it to our will, or to make informed decisions based on what we know, we need methods to make sense of such data. Of particular interest are probabilistic descriptions, which enable us to represent uncertainty and random variation inherent to the world around us.

This thesis presents and expands upon some tools for creating probabilistic models of sequences, with an eye towards applications involving speech and language. Modelling speech and language is not only of use for creating listening, reading, talking, and writing machines---for instance allowing human-friendly interfaces to future computational intelligences and smart devices of today---but probabilistic models may also ultimately tell us something about ourselves and the world we occupy.

The central theme of the thesis is the creation of new or improved models more appropriate for our intended applications, by weakening limiting and questionable assumptions made by standard modelling techniques. One contribution of this thesis examines causal-state splitting reconstruction (CSSR), an algorithm for learning discrete-valued sequence models whose states are minimal sufficient statistics for prediction. Unlike many traditional techniques, CSSR does not require the number of process states to be specified a priori, but builds a pattern vocabulary from data alone, making it applicable for language acquisition and the identification of stochastic grammars. A paper in the thesis shows that CSSR handles noise and errors expected in natural data poorly, but that the learner can be extended in a simple manner to yield more robust and stable results also in the presence of corruptions.

Even when the complexities of language are put aside, challenges remain. The seemingly simple task of accurately describing human speech signals, so that natural synthetic speech can be generated, has proved difficult, as humans are highly attuned to what speech should sound like. Two papers in the thesis therefore study nonparametric techniques suitable for improved acoustic modelling of speech for synthesis applications. Each of the two papers targets a known-incorrect assumption of established methods, based on the hypothesis that nonparametric techniques can better represent and recreate essential characteristics of natural speech.

In the first paper of the pair, Gaussian process dynamical models (GPDMs), nonlinear, continuous state-space dynamical models based on Gaussian processes, are shown to better replicate voiced speech, without traditional dynamical features or assumptions that cepstral parameters follow linear autoregressive processes. Additional dimensions of the state-space are able to represent other salient signal aspects such as prosodic variation. The second paper, meanwhile, introduces KDE-HMMs, asymptotically-consistent Markov models for continuous-valued data based on kernel density estimation, that additionally have been extended with a fixed-cardinality discrete hidden state. This construction is shown to provide improved probabilistic descriptions of nonlinear time series, compared to reference models from different paradigms. The hidden state can be used to control process output, making KDE-HMMs compelling as a probabilistic alternative to hybrid speech-synthesis approaches.

A final paper of the thesis discusses how models can be improved even when one is restricted to a fundamentally imperfect model class. Minimum entropy rate simplification (MERS), an information-theoretic scheme for postprocessing models for generative applications involving both speech and text, is introduced. MERS reduces the entropy rate of a model while remaining as close as possible to the starting model. This is shown to produce simplified models that concentrate on the most common and characteristic behaviours, and provides a continuum of simplifications between the original model and zero-entropy, completely predictable output. As the tails of fitted distributions may be inflated by noise or empirical variability that a model has failed to capture, MERS's ability to concentrate on high-probability output is also demonstrated to be useful for denoising models trained on disturbed data.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2013. , xviii, 80 p.
Series
Trita-EE, ISSN 1653-5146 ; 2013:042
Keyword [en]
Time series, acoustic modelling, speech synthesis, stochastic processes, causal-state splitting reconstruction, robust causal states, pattern discovery, Markov models, HMMs, nonparametric models, Gaussian processes, Gaussian process dynamical models, nonlinear Kalman filters, information theory, minimum entropy rate simplification, kernel density estimation, time-series bootstrap
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-134693ISBN: 978-91-7501-932-1 (print)OAI: oai:DiVA.org:kth-134693DiVA: diva2:667681
Public defence
2013-12-17, D3, Lindstedtsvägen 5, KTH, Stockholm, 09:00 (English)
Opponent
Supervisors
Projects
ACORNS: Acquisition of Communication and Recognition SkillsLISTA – The Listening Talker
Funder
EU, European Research Council, FP6-034362EU, FP7, Seventh Framework Programme, 256230
Note

QC 20131128

Available from: 2013-11-28 Created: 2013-11-27 Last updated: 2013-11-28Bibliographically approved
List of papers
1. Gaussian process dynamical models for nonparametric speech representation and synthesis
Open this publication in new window or tab >>Gaussian process dynamical models for nonparametric speech representation and synthesis
2012 (English)In: Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, IEEE , 2012, 4505-4508 p.Conference paper, Published paper (Refereed)
Abstract [en]

We propose Gaussian process dynamical models (GPDMs) as a new, nonparametric paradigm in acoustic models of speech. These use multidimensional, continuous state-spaces to overcome familiar issues with discrete-state, HMM-based speech models. The added dimensions allow the state to represent and describe more than just temporal structure as systematic differences in mean, rather than as mere correlations in a residual (which dynamic features or AR-HMMs do). Being based on Gaussian processes, the models avoid restrictive parametric or linearity assumptions on signal structure. We outline GPDM theory, and describe model setup and initialization schemes relevant to speech applications. Experiments demonstrate subjectively better quality of synthesized speech than from comparable HMMs. In addition, there is evidence for unsupervised discovery of salient speech structure.

Place, publisher, year, edition, pages
IEEE, 2012
Series
IEEE International Conference on Acoustics, Speech and Signal Processing. Proceedings, ISSN 1520-6149
Keyword
acoustic models, stochastic models, nonparametric speech synthesis, sampling
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-66403 (URN)10.1109/ICASSP.2012.6288919 (DOI)000312381404144 ()2-s2.0-84867596846 (Scopus ID)978-1-4673-0046-9 (ISBN)
Conference
ICASSP 2012, IEEE International Conference on Acoustics, Speech, and Signal Processing, March 25-30, 2012, Kyoto International Conference Center, Kyoto, Japan
Projects
LISTA
Funder
EU, FP7, Seventh Framework Programme, 256230ICT - The Next Generation
Note

QC 20120308

Available from: 2012-03-08 Created: 2012-01-26 Last updated: 2013-11-28Bibliographically approved
2. Picking up the pieces: Causal states in noisy data, and how to recover them
Open this publication in new window or tab >>Picking up the pieces: Causal states in noisy data, and how to recover them
2013 (English)In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 34, no 5, 587-594 p.Article in journal (Refereed) Published
Abstract [en]

Automatic structure discovery is desirable in many Markov model applications where a good topology (states and transitions) is not known a priori. CSSR is an established pattern discovery algorithm for stationary and ergodic stochastic symbol sequences that learns a predictively optimal Markov representation consisting of so-called causal states. By means of a novel algebraic criterion, we prove that the causal states of a simple process disturbed by random errors frequently are too complex to be learned fully, making CSSR diverge. In fact, the causal state representation of many hidden Markov models, representing simple but noise-disturbed data, has infinite cardinality. We also report that these problems can be solved by endowing CSSR with the ability to make approximations. The resulting algorithm, robust causal states (RCS), is able to recover the underlying causal structure from data corrupted by random substitutions, as is demonstrated both theoretically and in an experiment. The algorithm has potential applications in areas such as error correction and learning stochastic grammars.

Keyword
Computational mechanics, Causal states, CSSR, Hidden Markov model, HMM, Learnability
National Category
Computer and Information Science
Identifiers
urn:nbn:se:kth:diva-121467 (URN)000316425800020 ()2-s2.0-84873870023 (Scopus ID)
Funder
EU, European Research Council, FP6-034362
Note

QC 20130502

Available from: 2013-05-02 Created: 2013-04-29 Last updated: 2017-04-28Bibliographically approved
3. Minimum Entropy Rate Simplification of Stochastic Processes
Open this publication in new window or tab >>Minimum Entropy Rate Simplification of Stochastic Processes
(English)Manuscript (preprint) (Other academic)
Abstract [en]

We propose minimum entropy rate simplification (MERS), an information-theoretic, representation-independent framework for simplifying generative models of stochastic processes. Applications include improving model quality for sampling tasks by concentrating the probability mass on the most characteristic and accurately described behaviors while de-emphasizing the tails, and obtaining clean models from corrupted data (nonparametric denoising). This is the opposite of the smoothing step commonly applied to classification models. Drawing on rate-distortion theory, MERS seeks the minimum entropy-rate process under a constraint on the dissimilarity between the original and simplified processes. We particularly investigate the Kullback-Leibler divergence rate as a dissimilarity measure, where, compatible with our assumption that the starting model is disturbed or inaccurate, the simplification rather than the starting model is used for the reference distribution of the divergence. This leads to analytic solutions for stationary and ergodic Gaussian processes and Markov chains. The same formulas are also valid for maximum entropy smoothing under the same divergence constraint. In experiments, MERS successfully simplifies and denoises Markov models from text, speech, and meteorology.

Keyword
Markov processes, information theory, signal synthesis, sound and music computing, language generation, statistical models
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-134691 (URN)
Projects
LISTA – The Listening Talker
Funder
EU, FP7, Seventh Framework Programme, 256230
Note

QS 2013

Available from: 2013-11-27 Created: 2013-11-27 Last updated: 2013-11-28Bibliographically approved
4. Kernel Density Estimation-Based Markov Models with Hidden State
Open this publication in new window or tab >>Kernel Density Estimation-Based Markov Models with Hidden State
(English)Manuscript (preprint) (Other academic)
Abstract [en]

We consider Markov models of stochastic processes where the next-step conditional distribution is defined by a kernel density estimator (KDE), similar to certain time-series bootstrap schemes from the economic forecasting literature. The KDE Markov models (KDE-MMs) we discuss are nonlinear, nonparametric, fully probabilistic representations of stationary processes with strong asymptotic convergence properties. The models generate new data simply by concatenating points from the training data sequences in a context-sensitive manner, with some added noise. We present novel EM-type maximum-likelihood algorithms for data-driven bandwidth selection in KDE-MMs. Additionally, we augment the KDE-MMs with a hidden state, yielding a new model class, KDE-HMMs. The added state-variable enables long-range memory and signal structure representation, complementing the short-range correlations captured by the Markov process. This is compelling for modelling complex real-world processes such as speech and language data. The paper presents guaranteed-ascent EM-update equations for model parameters in the case of Gaussian kernels, as well as relaxed update formulas that greatly accelerate training in practice. Experiments demonstrate increased held-out set probability for KDE-HMMs on several challenging natural and synthetic data series, compared to traditional techniques such as autoregressive models, HMMs, and their combinations.

Keyword
hidden Markov models, nonparametric methods, kernel density estimation, autoregressive models, Markov forecast density
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:kth:diva-134692 (URN)
Projects
LISTA – The Listening Talker
Funder
EU, FP7, Seventh Framework Programme, 256230
Note

QS 2013

Available from: 2013-11-27 Created: 2013-11-27 Last updated: 2013-11-28Bibliographically approved

Open Access in DiVA

gustav_eje_henter_spikblad_2013(89 kB)25 downloads
File information
File name SPIKBLAD01.pdfFile size 89 kBChecksum SHA-512
919038a99458f311cd41fb014cdec0fa8ca3a2c2e62ba2703134369abbc8896f5fc34f0a6be8febb62b31972846438a51ec3c19a7f127ad6407c2444c8f85572
Type spikbladMimetype application/pdf
gustav_eje_henter_phd_thesis_2013(935 kB)441 downloads
File information
File name FULLTEXT02.pdfFile size 935 kBChecksum SHA-512
0cd1b784ca6d4ac2c84d0f249c3c37838ccccb6c15ae2b9e72c69886de239cffd5137a76aab61a40c8b522140e0878c474b1aef393bc30a96e66c5187bf0a705
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Henter, Gustav Eje
By organisation
Communication Theory
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 441 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 3433 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf