Gaussian process dynamical models for nonparametric speech representation and synthesis
2012 (English)In: Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, IEEE , 2012, 4505-4508 p.Conference paper (Refereed)
We propose Gaussian process dynamical models (GPDMs) as a new, nonparametric paradigm in acoustic models of speech. These use multidimensional, continuous state-spaces to overcome familiar issues with discrete-state, HMM-based speech models. The added dimensions allow the state to represent and describe more than just temporal structure as systematic differences in mean, rather than as mere correlations in a residual (which dynamic features or AR-HMMs do). Being based on Gaussian processes, the models avoid restrictive parametric or linearity assumptions on signal structure. We outline GPDM theory, and describe model setup and initialization schemes relevant to speech applications. Experiments demonstrate subjectively better quality of synthesized speech than from comparable HMMs. In addition, there is evidence for unsupervised discovery of salient speech structure.
Place, publisher, year, edition, pages
IEEE , 2012. 4505-4508 p.
, IEEE International Conference on Acoustics, Speech and Signal Processing. Proceedings, ISSN 1520-6149
acoustic models, stochastic models, nonparametric speech synthesis, sampling
Other Electrical Engineering, Electronic Engineering, Information Engineering
IdentifiersURN: urn:nbn:se:kth:diva-66403DOI: 10.1109/ICASSP.2012.6288919ISI: 000312381404144ScopusID: 2-s2.0-84867596846ISBN: 978-1-4673-0046-9OAI: oai:DiVA.org:kth-66403DiVA: diva2:483922
ICASSP 2012, IEEE International Conference on Acoustics, Speech, and Signal Processing, March 25-30, 2012, Kyoto International Conference Center, Kyoto, Japan
FunderEU, FP7, Seventh Framework Programme, 256230ICT - The Next Generation
QC 201203082012-03-082012-01-262013-11-28Bibliographically approved