Waveform quantization of speech using Gaussian mixture models
2004 (English)In: 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, IEEE , 2004, 165-168 p.Conference paper (Refereed)
Waveform quantization of speech using Gaussian mixture models (GMMs) is proposed. GMMs are trained directly on the speech waveform, and high dimensional vector quantizers (VQs) that efficiently exploit the redundancy are constructed based on the GMM parameters. Two types of GMMs are studied. The complexity of the scheme is independent of the rate, and the rate can be changed without retraining the VQ. A shape-gain structure improves performance and robustness. Pre- and post-processing using spectral amplitude warping further improves perceptual quality. A 32-dimensional VQ operating at 2 bits/sample reproduces speech sampled at 8 kHz with a PESQ score of 4.2.
Place, publisher, year, edition, pages
IEEE , 2004. 165-168 p.
, IEEE International Conference on Acoustics, Speech and Signal Processing. Proceedings, ISSN 1520-6149
Computational complexity, Mathematical models, Parameter estimation, Random processes, Robustness (control systems), Vector quantization, Waveform analysis
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-44753ISI: 000222173500042ScopusID: 2-s2.0-4544284645ISBN: 0-7803-8484-9OAI: oai:DiVA.org:kth-44753DiVA: diva2:451419
IEEE International Conference on Acoustics, Speech, and Signal Processing Location: Montreal, CANADA Date: MAY 17-21, 2004
QC 201110252011-10-252011-10-252014-12-15Bibliographically approved