Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Knowledge-based speech enhancement
KTH, School of Electrical Engineering (EES).
2005 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

Speech is a fundamental means of human communication. In the last several decades, much effort has been devoted to the efficient transmission and storage of speech signals. With advances in technology making mobile communication ubiquitous, communications anywhere has become a reality. The freedom and flexibility offered by mobile technology brings with it new challenges, one of which is robustness to acoustic background noise. Speech enhancement systems form a vital front-end for mobile telephony in noisy environments such as in cars, cafeterias, subway stations, etc., in hearing aids, and to improve the performance of speech recognition systems.

In this thesis, which consists of four research articles, we discuss both single and multi-microphone approaches to speech enhancement. The main contribution of this thesis is a framework to exploit available prior knowledge about both speech and noise. The physiology of speech production places a constraint on the possible shapes of the speech spectral envelope, and this information s captured using codebooks of speech linear predictive (LP) coefficients obtained from a large training database. Similarly, information about commonly occurring noise types is captured using a set of noise codebooks, which can be combined with sound environment classi¯cation to treat different environments differently. In paper A, we introduce maximum-likelihood estimation of the speech and noise LP parameters using the codebooks. The codebooks capture only the spectral shape. The speech and noise gain factors are obtained through a frame-by-frame optimization, providing good performance in practical nonstationary noise environments. The estimated parameters are subsequently used in a Wiener filter. Paper B describes Bayesian minimum mean squared error estimation of the speech and noise LP parameters and functions there-of, while retaining the in- stantaneous gain computation. Both memoryless and memory-based estimators are derived.

While papers A and B describe single-channel techniques, paper C describes a multi-channel Bayesian speech enhancement approach, where, in addition to temporal processing, the spatial diversity provided by multiple microphones s also exploited. In paper D, we introduce a multi-channel noise reduction technique motivated by blind source separation (BSS) concepts. In contrast to standard BSS approaches, we use the knowledge that one of the signals is speech and that the other is noise, and exploit their different characteristics.

Place, publisher, year, edition, pages
Stockholm: KTH , 2005. , xii, 61 p.
Series
Trita-S3-SIP, ISSN 1652-4500 ; 2005:1
Keyword [en]
speech enhancement, noise reduction, linear predictive coe±cients, autoregressive, codebooks, maximum-likelihood, Bayesian, nonstationary noise, blind source separation.
National Category
Telecommunications
Identifiers
URN: urn:nbn:se:kth:diva-456ISBN: 91-628-6643-5 (print)OAI: oai:DiVA.org:kth-456DiVA: diva2:12853
Public defence
2005-10-28, Sal B2, Brinellvägen 23, Stockholm, 09:00
Opponent
Supervisors
Note
QC 20100929Available from: 2005-10-20 Created: 2005-10-20 Last updated: 2010-09-29Bibliographically approved
List of papers
1. Codebook driven short-term predictor parameter estimation for speech enhancement
Open this publication in new window or tab >>Codebook driven short-term predictor parameter estimation for speech enhancement
2006 (English)In: IEEE transactions on speech and audio processing, ISSN 1063-6676, E-ISSN 1558-2353, Vol. 14, no 1, 163-176 p.Article in journal (Refereed) Published
Abstract [en]

 In this paper, we present a new technique for the estimation of short-term linear predictive parameters of speech and noise from noisy data and their subsequent use in waveform enhancement schemes. The method exploits a priori information about speech and noise spectral shapes stored in trained codebooks, parameterized as linear predictive coefficients. The method also uses information about noise statistics estimated from the noisy observation. Maximum-likelihood estimates of the speech and noise short-term predictor parameters are obtained by searching for the combination of codebook entries that optimizes the likelihood. The estimation involves the computation of the excitation variances of the speech and noise auto-regressive models on a frame-by-frame basis, using the a priori information and the noisy observation. The high computational complexity resulting from a full search of the joint speech and noise codebooks is avoided through an iterative optimization procedure. We introduce a classified noise codebook scheme that uses different noise codebooks for different noise types. Experimental results show that the use of a priori information and the calculation of the instantaneous speech and noise excitation variances on a frame-by-frame basis result in good performance in both stationary and nonstationary noise conditions.

Keyword
autoregressive models, codebooks, maximum-likelihood, nonstationary noise, short-term predictor, speech enhancement
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-7734 (URN)10.1109/TSA.2005.854113 (DOI)000235369100017 ()2-s2.0-33744970011 (Scopus ID)
Note
QC 20100903. Uppdaterad från Accepted till PublishedAvailable from: 2005-10-20 Created: 2005-10-20 Last updated: 2011-08-25Bibliographically approved
2. Codebook-based Bayesian speech enhancement for nonstationary environments
Open this publication in new window or tab >>Codebook-based Bayesian speech enhancement for nonstationary environments
2007 (English)In: IEEE transactions on speech and audio processing, ISSN 1063-6676, E-ISSN 1558-2353, Vol. 15, no 2, 441-452 p.Article in journal (Refereed) Published
Abstract [en]

 In this paper, we propose a Bayesian minimum mean squared error approach for the joint estimation of the short-term predictor parameters of speech and noise, from the noisy observation. We use trained codebooks of speech and noise linear predictive coefficients to model the a priori information required by the Bayesian scheme. In contrast to current Bayesian estimation approaches that consider the excitation variances as part of the a priori information, in the proposed method they are computed online for each short-time segment, based on the observation at hand. Consequently, the method performs well in nonstationary noise conditions. The resulting estimates of the speech and noise spectra can be used in a Wiener filter or any state-of-the-art speech enhancement system. We develop both memoryless (using information from the current frame alone) and memory-based (using information from the current and previous frames) estimators. Estimation of functions of the short-term predictor parameters is also addressed, in particular one that leads to the minimum mean squared error estimate of the clean speech signal. Experiments indicate that the scheme proposed in this paper performs significantly better than competing methods.

Keyword
Bayesian, Codebooks, Linear predictive coding, Noise estimation, Speech enhancement, Speech processing, Wiener filtering
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-7735 (URN)10.1109/TASL.2006.881696 (DOI)000243914800007 ()2-s2.0-51449109652 (Scopus ID)
Note
QC 20100903. Uppdaterad från Submitted till Published (20100903)Available from: 2005-10-20 Created: 2005-10-20 Last updated: 2011-08-25Bibliographically approved
3. Multi-channel parametric speech enhancement
Open this publication in new window or tab >>Multi-channel parametric speech enhancement
2006 (English)In: IEEE Signal Processing Letters, ISSN 1070-9908, E-ISSN 1558-2361, Vol. 13, no 5, 304-307 p.Article in journal (Refereed) Published
Abstract [en]

We present a parametric model-based multichannel approach for speech enhancement. By employing an autoregressive model for the speech signal and using a trained codebook of speech linear predictive coefficients, minimum mean square error estimation of the speech signal is performed. By explicitly accounting for steering errors in the signal model, robust estimates are obtained. Experiments show that the proposed method results in significant performance gains.

Keyword
Acoustic noise, Acoustic signal processing, Array signal processing, Autoregressive processes, Speech enhancement, Speech processing
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-7736 (URN)10.1109/LSP.2005.863819 (DOI)000236977700014 ()2-s2.0-33645803631 (Scopus ID)
Note
QC 20100923. Uppdaterad från Accepted till Published (20100923).Available from: 2005-10-20 Created: 2005-10-20 Last updated: 2010-09-23Bibliographically approved
4. Speech denoising through source separation and min-max tracking
Open this publication in new window or tab >>Speech denoising through source separation and min-max tracking
(English)In: IEEE Signal Processing Letters, ISSN 1070-9908, E-ISSN 1558-2361Article in journal (Refereed) Submitted
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-7737 (URN)
Note
QC 20100929Available from: 2005-10-20 Created: 2005-10-20 Last updated: 2010-09-29Bibliographically approved

Open Access in DiVA

fulltext(612 kB)3403 downloads
File information
File name FULLTEXT01.pdfFile size 612 kBChecksum MD5
6f1eb2774b94b9d2f70ce62f6b9f275f17df6a9009b3bb5a833f796ef7bf7913a1be2ae8
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Srinivasan, Sriram
By organisation
School of Electrical Engineering (EES)
Telecommunications

Search outside of DiVA

GoogleGoogle Scholar
Total: 3403 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 945 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf