Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Codebook driven short-term predictor parameter estimation for speech enhancement
KTH, School of Electrical Engineering (EES).
KTH, School of Electrical Engineering (EES), Sound and Image Processing.
KTH, School of Electrical Engineering (EES), Sound and Image Processing.
2006 (English)In: IEEE transactions on speech and audio processing, ISSN 1063-6676, E-ISSN 1558-2353, Vol. 14, no 1, 163-176 p.Article in journal (Refereed) Published
Abstract [en]

 In this paper, we present a new technique for the estimation of short-term linear predictive parameters of speech and noise from noisy data and their subsequent use in waveform enhancement schemes. The method exploits a priori information about speech and noise spectral shapes stored in trained codebooks, parameterized as linear predictive coefficients. The method also uses information about noise statistics estimated from the noisy observation. Maximum-likelihood estimates of the speech and noise short-term predictor parameters are obtained by searching for the combination of codebook entries that optimizes the likelihood. The estimation involves the computation of the excitation variances of the speech and noise auto-regressive models on a frame-by-frame basis, using the a priori information and the noisy observation. The high computational complexity resulting from a full search of the joint speech and noise codebooks is avoided through an iterative optimization procedure. We introduce a classified noise codebook scheme that uses different noise codebooks for different noise types. Experimental results show that the use of a priori information and the calculation of the instantaneous speech and noise excitation variances on a frame-by-frame basis result in good performance in both stationary and nonstationary noise conditions.

Place, publisher, year, edition, pages
2006. Vol. 14, no 1, 163-176 p.
Keyword [en]
autoregressive models, codebooks, maximum-likelihood, nonstationary noise, short-term predictor, speech enhancement
National Category
Telecommunications
Identifiers
URN: urn:nbn:se:kth:diva-7734DOI: 10.1109/TSA.2005.854113ISI: 000235369100017Scopus ID: 2-s2.0-33744970011OAI: oai:DiVA.org:kth-7734DiVA: diva2:12849
Note
QC 20100903. Uppdaterad från Accepted till PublishedAvailable from: 2005-10-20 Created: 2005-10-20 Last updated: 2011-08-25Bibliographically approved
In thesis
1. Knowledge-based speech enhancement
Open this publication in new window or tab >>Knowledge-based speech enhancement
2005 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

Speech is a fundamental means of human communication. In the last several decades, much effort has been devoted to the efficient transmission and storage of speech signals. With advances in technology making mobile communication ubiquitous, communications anywhere has become a reality. The freedom and flexibility offered by mobile technology brings with it new challenges, one of which is robustness to acoustic background noise. Speech enhancement systems form a vital front-end for mobile telephony in noisy environments such as in cars, cafeterias, subway stations, etc., in hearing aids, and to improve the performance of speech recognition systems.

In this thesis, which consists of four research articles, we discuss both single and multi-microphone approaches to speech enhancement. The main contribution of this thesis is a framework to exploit available prior knowledge about both speech and noise. The physiology of speech production places a constraint on the possible shapes of the speech spectral envelope, and this information s captured using codebooks of speech linear predictive (LP) coefficients obtained from a large training database. Similarly, information about commonly occurring noise types is captured using a set of noise codebooks, which can be combined with sound environment classi¯cation to treat different environments differently. In paper A, we introduce maximum-likelihood estimation of the speech and noise LP parameters using the codebooks. The codebooks capture only the spectral shape. The speech and noise gain factors are obtained through a frame-by-frame optimization, providing good performance in practical nonstationary noise environments. The estimated parameters are subsequently used in a Wiener filter. Paper B describes Bayesian minimum mean squared error estimation of the speech and noise LP parameters and functions there-of, while retaining the in- stantaneous gain computation. Both memoryless and memory-based estimators are derived.

While papers A and B describe single-channel techniques, paper C describes a multi-channel Bayesian speech enhancement approach, where, in addition to temporal processing, the spatial diversity provided by multiple microphones s also exploited. In paper D, we introduce a multi-channel noise reduction technique motivated by blind source separation (BSS) concepts. In contrast to standard BSS approaches, we use the knowledge that one of the signals is speech and that the other is noise, and exploit their different characteristics.

Place, publisher, year, edition, pages
Stockholm: KTH, 2005. xii, 61 p.
Series
Trita-S3-SIP, ISSN 1652-4500 ; 2005:1
Keyword
speech enhancement, noise reduction, linear predictive coe±cients, autoregressive, codebooks, maximum-likelihood, Bayesian, nonstationary noise, blind source separation.
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-456 (URN)91-628-6643-5 (ISBN)
Public defence
2005-10-28, Sal B2, Brinellvägen 23, Stockholm, 09:00
Opponent
Supervisors
Note
QC 20100929Available from: 2005-10-20 Created: 2005-10-20 Last updated: 2010-09-29Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Srinivasan, SriramSamuelsson, JonasKleijn, Bastiaan
By organisation
School of Electrical Engineering (EES)Sound and Image Processing
In the same journal
IEEE transactions on speech and audio processing
Telecommunications

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 375 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf