Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multi-channel parametric speech enhancement
KTH, School of Electrical Engineering (EES).
KTH, School of Electrical Engineering (EES).
2006 (English)In: IEEE Signal Processing Letters, ISSN 1070-9908, E-ISSN 1558-2361, Vol. 13, no 5, 304-307 p.Article in journal (Refereed) Published
Abstract [en]

We present a parametric model-based multichannel approach for speech enhancement. By employing an autoregressive model for the speech signal and using a trained codebook of speech linear predictive coefficients, minimum mean square error estimation of the speech signal is performed. By explicitly accounting for steering errors in the signal model, robust estimates are obtained. Experiments show that the proposed method results in significant performance gains.

Place, publisher, year, edition, pages
2006. Vol. 13, no 5, 304-307 p.
Keyword [en]
Acoustic noise, Acoustic signal processing, Array signal processing, Autoregressive processes, Speech enhancement, Speech processing
National Category
Telecommunications
Identifiers
URN: urn:nbn:se:kth:diva-7736DOI: 10.1109/LSP.2005.863819ISI: 000236977700014Scopus ID: 2-s2.0-33645803631OAI: oai:DiVA.org:kth-7736DiVA: diva2:12851
Note
QC 20100923. Uppdaterad från Accepted till Published (20100923).Available from: 2005-10-20 Created: 2005-10-20 Last updated: 2017-12-14Bibliographically approved
In thesis
1. Knowledge-based speech enhancement
Open this publication in new window or tab >>Knowledge-based speech enhancement
2005 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

Speech is a fundamental means of human communication. In the last several decades, much effort has been devoted to the efficient transmission and storage of speech signals. With advances in technology making mobile communication ubiquitous, communications anywhere has become a reality. The freedom and flexibility offered by mobile technology brings with it new challenges, one of which is robustness to acoustic background noise. Speech enhancement systems form a vital front-end for mobile telephony in noisy environments such as in cars, cafeterias, subway stations, etc., in hearing aids, and to improve the performance of speech recognition systems.

In this thesis, which consists of four research articles, we discuss both single and multi-microphone approaches to speech enhancement. The main contribution of this thesis is a framework to exploit available prior knowledge about both speech and noise. The physiology of speech production places a constraint on the possible shapes of the speech spectral envelope, and this information s captured using codebooks of speech linear predictive (LP) coefficients obtained from a large training database. Similarly, information about commonly occurring noise types is captured using a set of noise codebooks, which can be combined with sound environment classi¯cation to treat different environments differently. In paper A, we introduce maximum-likelihood estimation of the speech and noise LP parameters using the codebooks. The codebooks capture only the spectral shape. The speech and noise gain factors are obtained through a frame-by-frame optimization, providing good performance in practical nonstationary noise environments. The estimated parameters are subsequently used in a Wiener filter. Paper B describes Bayesian minimum mean squared error estimation of the speech and noise LP parameters and functions there-of, while retaining the in- stantaneous gain computation. Both memoryless and memory-based estimators are derived.

While papers A and B describe single-channel techniques, paper C describes a multi-channel Bayesian speech enhancement approach, where, in addition to temporal processing, the spatial diversity provided by multiple microphones s also exploited. In paper D, we introduce a multi-channel noise reduction technique motivated by blind source separation (BSS) concepts. In contrast to standard BSS approaches, we use the knowledge that one of the signals is speech and that the other is noise, and exploit their different characteristics.

Place, publisher, year, edition, pages
Stockholm: KTH, 2005. xii, 61 p.
Series
Trita-S3-SIP, ISSN 1652-4500 ; 2005:1
Keyword
speech enhancement, noise reduction, linear predictive coe±cients, autoregressive, codebooks, maximum-likelihood, Bayesian, nonstationary noise, blind source separation.
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-456 (URN)91-628-6643-5 (ISBN)
Public defence
2005-10-28, Sal B2, Brinellvägen 23, Stockholm, 09:00
Opponent
Supervisors
Note
QC 20100929Available from: 2005-10-20 Created: 2005-10-20 Last updated: 2010-09-29Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Srinivasan, SriramKleijn, Bastiaan
By organisation
School of Electrical Engineering (EES)
In the same journal
IEEE Signal Processing Letters
Telecommunications

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 73 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf