Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Codebook-based Bayesian speech enhancement for nonstationary environments
KTH, School of Electrical Engineering (EES), Sound and Image Processing.
KTH, School of Electrical Engineering (EES), Sound and Image Processing.
KTH, School of Electrical Engineering (EES), Sound and Image Processing.
2007 (English)In: IEEE transactions on speech and audio processing, ISSN 1063-6676, E-ISSN 1558-2353, Vol. 15, no 2, 441-452 p.Article in journal (Refereed) Published
Abstract [en]

 In this paper, we propose a Bayesian minimum mean squared error approach for the joint estimation of the short-term predictor parameters of speech and noise, from the noisy observation. We use trained codebooks of speech and noise linear predictive coefficients to model the a priori information required by the Bayesian scheme. In contrast to current Bayesian estimation approaches that consider the excitation variances as part of the a priori information, in the proposed method they are computed online for each short-time segment, based on the observation at hand. Consequently, the method performs well in nonstationary noise conditions. The resulting estimates of the speech and noise spectra can be used in a Wiener filter or any state-of-the-art speech enhancement system. We develop both memoryless (using information from the current frame alone) and memory-based (using information from the current and previous frames) estimators. Estimation of functions of the short-term predictor parameters is also addressed, in particular one that leads to the minimum mean squared error estimate of the clean speech signal. Experiments indicate that the scheme proposed in this paper performs significantly better than competing methods.

Place, publisher, year, edition, pages
2007. Vol. 15, no 2, 441-452 p.
Keyword [en]
Bayesian, Codebooks, Linear predictive coding, Noise estimation, Speech enhancement, Speech processing, Wiener filtering
National Category
Telecommunications
Identifiers
URN: urn:nbn:se:kth:diva-7735DOI: 10.1109/TASL.2006.881696ISI: 000243914800007Scopus ID: 2-s2.0-51449109652OAI: oai:DiVA.org:kth-7735DiVA: diva2:12850
Note
QC 20100903. Uppdaterad från Submitted till Published (20100903)Available from: 2005-10-20 Created: 2005-10-20 Last updated: 2011-08-25Bibliographically approved
In thesis
1. Knowledge-based speech enhancement
Open this publication in new window or tab >>Knowledge-based speech enhancement
2005 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

Speech is a fundamental means of human communication. In the last several decades, much effort has been devoted to the efficient transmission and storage of speech signals. With advances in technology making mobile communication ubiquitous, communications anywhere has become a reality. The freedom and flexibility offered by mobile technology brings with it new challenges, one of which is robustness to acoustic background noise. Speech enhancement systems form a vital front-end for mobile telephony in noisy environments such as in cars, cafeterias, subway stations, etc., in hearing aids, and to improve the performance of speech recognition systems.

In this thesis, which consists of four research articles, we discuss both single and multi-microphone approaches to speech enhancement. The main contribution of this thesis is a framework to exploit available prior knowledge about both speech and noise. The physiology of speech production places a constraint on the possible shapes of the speech spectral envelope, and this information s captured using codebooks of speech linear predictive (LP) coefficients obtained from a large training database. Similarly, information about commonly occurring noise types is captured using a set of noise codebooks, which can be combined with sound environment classi¯cation to treat different environments differently. In paper A, we introduce maximum-likelihood estimation of the speech and noise LP parameters using the codebooks. The codebooks capture only the spectral shape. The speech and noise gain factors are obtained through a frame-by-frame optimization, providing good performance in practical nonstationary noise environments. The estimated parameters are subsequently used in a Wiener filter. Paper B describes Bayesian minimum mean squared error estimation of the speech and noise LP parameters and functions there-of, while retaining the in- stantaneous gain computation. Both memoryless and memory-based estimators are derived.

While papers A and B describe single-channel techniques, paper C describes a multi-channel Bayesian speech enhancement approach, where, in addition to temporal processing, the spatial diversity provided by multiple microphones s also exploited. In paper D, we introduce a multi-channel noise reduction technique motivated by blind source separation (BSS) concepts. In contrast to standard BSS approaches, we use the knowledge that one of the signals is speech and that the other is noise, and exploit their different characteristics.

Place, publisher, year, edition, pages
Stockholm: KTH, 2005. xii, 61 p.
Series
Trita-S3-SIP, ISSN 1652-4500 ; 2005:1
Keyword
speech enhancement, noise reduction, linear predictive coe±cients, autoregressive, codebooks, maximum-likelihood, Bayesian, nonstationary noise, blind source separation.
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-456 (URN)91-628-6643-5 (ISBN)
Public defence
2005-10-28, Sal B2, Brinellvägen 23, Stockholm, 09:00
Opponent
Supervisors
Note
QC 20100929Available from: 2005-10-20 Created: 2005-10-20 Last updated: 2010-09-29Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Srinivasan, SriramSamuelsson, JonasKleijn, Bastiaan
By organisation
Sound and Image Processing
In the same journal
IEEE transactions on speech and audio processing
Telecommunications

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 127 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf