Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Model Based Speech Enhancement and Coding
KTH, School of Electrical Engineering (EES), Sound and Image Processing.
2007 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

In mobile speech communication, adverse conditions, such as noisy acoustic environments and unreliable network connections, may severely degrade the intelligibility and natural- ness of the received speech quality, and increase the listening effort. This thesis focuses on countermeasures based on statistical signal processing techniques. The main body of the thesis consists of three research articles, targeting two specific problems: speech enhancement for noise reduction and flexible source coder design for unreliable networks.

Papers A and B consider speech enhancement for noise reduction. New schemes based on an extension to the auto-regressive (AR) hidden Markov model (HMM) for speech and noise are proposed. Stochastic models for speech and noise gains (excitation variance from an AR model) are integrated into the HMM framework in order to improve the modeling of energy variation. The extended model is referred to as a stochastic-gain hidden Markov model (SG-HMM). The speech gain describes the energy variations of the speech phones, typically due to differences in pronunciation and/or different vocalizations of individual speakers. The noise gain improves the tracking of the time-varying energy of non-stationary noise, e.g., due to movement of the noise source. In Paper A, it is assumed that prior knowledge on the noise environment is available, so that a pre-trained noise model is used. In Paper B, the noise model is adaptive and the model parameters are estimated on-line from the noisy observations using a recursive estimation algorithm. Based on the speech and noise models, a novel Bayesian estimator of the clean speech is developed in Paper A, and an estimator of the noise power spectral density (PSD) in Paper B. It is demonstrated that the proposed schemes achieve more accurate models of speech and noise than traditional techniques, and as part of a speech enhancement system provide improved speech quality, particularly for non-stationary noise sources.

In Paper C, a flexible entropy-constrained vector quantization scheme based on Gaus- sian mixture model (GMM), lattice quantization, and arithmetic coding is proposed. The method allows for changing the average rate in real-time, and facilitates adaptation to the currently available bandwidth of the network. A practical solution to the classical issue of indexing and entropy-coding the quantized code vectors is given. The proposed scheme has a computational complexity that is independent of rate, and quadratic with respect to vector dimension. Hence, the scheme can be applied to the quantization of source vectors in a high dimensional space. The theoretical performance of the scheme is analyzed under a high-rate assumption. It is shown that, at high rate, the scheme approaches the theoretically optimal performance, if the mixture components are located far apart. The practical performance of the scheme is confirmed through simulations on both synthetic and speech-derived source vectors.

Place, publisher, year, edition, pages
Stockholm: KTH , 2007. , xii, 49 p.
Series
Trita-EE, ISSN 1653-5146 ; 2007:018
Keyword [en]
statistical model, Gaussian mixture mdel (GMM), hidden Markov model (HMM), moise reduction
National Category
Telecommunications
Identifiers
URN: urn:nbn:se:kth:diva-4412ISBN: 978-91-682-7157-4 OAI: oai:DiVA.org:kth-4412DiVA: diva2:12190
Public defence
2007-06-11, Sal FD5, AlbaNova, Roslagstullsbacken 21, Stockholm, 09:15
Opponent
Supervisors
Note
QC 20100825Available from: 2007-05-31 Created: 2007-05-31 Last updated: 2010-08-25Bibliographically approved
List of papers
1. HMM-based gain-modeling for enhancement of speech in noise
Open this publication in new window or tab >>HMM-based gain-modeling for enhancement of speech in noise
2007 (English)In: IEEE transactions on speech and audio processing, ISSN 1063-6676, E-ISSN 1558-2353, Vol. 15, no 3, 882-892 p.Article in journal (Refereed) Published
Abstract [en]

Accurate modeling and estimation of speech and noise gains facilitate good performance of speech. enhancement methods using data-driven prior models. In this paper, we propose a hidden Markov model (HMM)-based speech enhancement method using explicit gain modeling. Through the introduction of stochastic gain variables, energy variation in both speech and noise is explicitly modeled in a unified framework. The speech gain models the energy variations of the speech phones, typically due to differences in pronunciation and/or different vocalizations of individual speakers. The noise gain helps to improve the tracking of the time-varying energy of nonstationary noise. The expectationmaximization (EM) algorithm is used to perform offline estimation of the time-invariant model parameters. The time-varying model'parameters are estimated online using the recursive EM algorithm. The. proposed gain modeling techniques are applied to a novel Bayesian speech estimator, and the performance of the proposed enhancement method is evaluated through objective and subjective tests. The experimental results confirm the advantage of explicit gain modeling, particularly for nonstationary noise sources.

Keyword
Gain modeling; Hidden Markov modeling (HMM); Noise suppression; Speech enhancement; Accurate modeling; Bayesian; Data-driven; Energy variations; Expectation-maximization algorithms; Gain modeling; Gain models; Hidden Markov modeling (HMM); Modeling techniques; Noise gains; Noise suppression; Non-stationary noise; Offline; Recursive em; Speech enhancement methods; Subjective tests; Time-invariant models; Time-varying; Time-varying model parameters; Unified frameworks; Polarization; Speech enhancement; Statistical tests; Time varying systems; Hidden Markov models
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-7235 (URN)10.1109/TASL.2006.885256 (DOI)000244318600013 ()2-s2.0-51449116166 (Scopus ID)
Note
QC 20100825Available from: 2007-05-31 Created: 2007-05-31 Last updated: 2017-12-14Bibliographically approved
2. Online noise estimation using stochastic-gain HMM for speech enhancement
Open this publication in new window or tab >>Online noise estimation using stochastic-gain HMM for speech enhancement
2008 (English)In: IEEE transactions on speech and audio processing, ISSN 1063-6676, E-ISSN 1558-2353, Vol. 16, no 4, 835-846 p.Article in journal (Refereed) Published
Abstract [en]

We propose a noise estimation algorithm for single-channel noise suppression in dynamic noisy environments. A stochastic-gain hidden Markov model (SG-HMM) is used to model the statistics of nonstationary noise with time-varying energy. The noise model is adaptive and the model parameters are estimated online from noisy observations using a recursive estimation algorithm. The parameter estimation is derived for the maximum-likelihood criterion and the algorithm is based on the recursive expectation maximization (EM) framework. The proposed method facilitates continuous adaptation to changes of both noise spectral shapes and noise energy levels, e.g., due to movement of the noise source. Using the estimated noise model, we also develop an estimator of the noise power spectral density (PSD) based on recursive averaging of estimated noise sample spectra. We demonstrate that the proposed scheme achieves more accurate estimates of the noise model and noise PSD, and as part of a speech enhancement system facilitates a lower level of residual noise.

Keyword
gain modeling; noise estimation; noise model adaptation; noise suppression; stochastic-gain hidden Markov model (SG-HMM)
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-7236 (URN)10.1109/TASL.2008.916055 (DOI)000258033600014 ()2-s2.0-64849085665 (Scopus ID)
Note
QC 20100825. Uppdaterad från Submitted till Published 20100825.Available from: 2007-05-31 Created: 2007-05-31 Last updated: 2017-12-14Bibliographically approved
3. On entropy-constrained vector quantization using gaussian mixture models
Open this publication in new window or tab >>On entropy-constrained vector quantization using gaussian mixture models
2008 (English)In: IEEE Transactions on Communications, ISSN 0090-6778, E-ISSN 1558-0857, Vol. 56, no 12, 2094-2104 p.Article in journal (Refereed) Published
Abstract [en]

A flexible and low-complexity entropy-constrained vector quantizer (ECVQ) scheme based on Gaussian mixture models (GMMs), lattice quantization, and arithmetic coding is presented. The source is assumed to have a probability density function of a GMM. An input vector is first classified to one of the mixture components, and the Karhunen-Loeve transform of the selected mixture component is applied to the vector, followed by quantization using a lattice structured codebook. Finally, the scalar elements of the quantized vector are entropy coded sequentially using a specially designed arithmetic coder. The computational complexity of the proposed scheme is low, and independent of the coding rate in both the encoder and the decoder. Therefore, the proposed scheme serves as a lower complexity alternative to the GMM based ECVQ proposed by Gardner, Subramaniam and Rao [1]. The performance of the proposed scheme is analyzed under a high-rate assumption, and quantified for a given GMM. The practical performance of the scheme was evaluated through simulations on both synthetic and speech line spectral frequency (LSF) vectors. For LSF quantization, the proposed scheme has a comparable performance to [1] at rates relevant for speech coding (20-28 bits per vector) with lower computational complexity.

Keyword
Entropy constrained, vector quantization, VQ, lattice, Gaussian mixture model, GMM, arithmetic coding, DIFFERENCE DISTORTION MEASURES, MAXIMUM-LIKELIHOOD, BLOCK QUANTIZATION, EM ALGORITHM, QUANTIZERS, LATTICES, IMAGE
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-7237 (URN)10.1109/TCOMM.2008.070357 (DOI)000261700500016 ()2-s2.0-58049199404 (Scopus ID)
Note
QC 20100825. Tidigare titel: Entropyconstrained vector quantization using Gaussian mixture models. Titel ändrad samt uppdaterad från Submitted till Published 20100825.Available from: 2007-05-31 Created: 2007-05-31 Last updated: 2017-12-14Bibliographically approved

Open Access in DiVA

fulltext(671 kB)896 downloads
File information
File name FULLTEXT01.pdfFile size 671 kBChecksum MD5
532563c3bde701548109137b20ca900aab08eb9071e2811682f39e4b20f8715e8f7dcd37
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Zhao, David Yuheng
By organisation
Sound and Image Processing
Telecommunications

Search outside of DiVA

GoogleGoogle Scholar
Total: 896 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1245 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf