Change search
ReferencesLink to record
Permanent link

Direct link
Phase Processing for Single-Channel Speech Enhancement
KTH, School of Electrical Engineering (EES), Communication Theory. Siemens Corp Res, Princeton, NJ USA.
2015 (English)In: IEEE signal processing magazine (Print), ISSN 1053-5888, E-ISSN 1558-0792, Vol. 32, no 2, 55-66 p.Article in journal (Refereed) Published
Abstract [en]

With the advancement of technology, both assisted listening devices and speech communication devices are becoming more portable and also more frequently used. As a consequence, users of devices such as hearing aids, cochlear implants, and mobile telephones, expect their devices to work robustly anywhere and at any time. This holds in particular for challenging noisy environments like a cafeteria, a restaurant, a subway, a factory, or in traffic. One way to making assisted listening devices robust to noise is to apply speech enhancement algorithms. To improve the corrupted speech, spatial diversity can be exploited by a constructive combination of microphone signals (so-called beamforming), and by exploiting the different spectro-temporal properties of speech and noise. Here, we focus on single-channel speech enhancement algorithms which rely on spectrotemporal properties. On the one hand, these algorithms can be employed when the miniaturization of devices only allows for using a single microphone. On the other hand, when multiple microphones are available, single-channel algorithms can be employed as a postprocessor at the output of a beamformer. To exploit the short-term stationary properties of natural sounds, many of these approaches process the signal in a time-frequency representation, most frequently the short-time discrete Fourier transform (STFT) domain. In this domain, the coefficients of the signal are complex-valued, and can therefore be represented by their absolute value (referred to in the literature both as STFT magnitude and STFT amplitude) and their phase. While the modeling and processing of the STFT magnitude has been the center of interest in the past three decades, phase has been largely ignored. In this article, we review the role of phase processing for speech enhancement in the context of assisted listening and speech communication devices. We explain why most of the research conducted in this field used to focus on estimating spectral magnitudes in the STFT domain, and why recently phase processing is attracting increasing interest in the speech enhancement community. Furthermore, we review both early and recent methods for phase processing in speech enhancement. We aim to show that phase processing is an exciting field of research with the potential to make assisted listening and speech communication devices more robust in acoustically challenging environments.

Place, publisher, year, edition, pages
2015. Vol. 32, no 2, 55-66 p.
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
URN: urn:nbn:se:kth:diva-161951DOI: 10.1109/MSP.2014.2369251ISI: 000349771400009ScopusID: 2-s2.0-84923203239OAI: diva2:800992

QC 20150408

Available from: 2015-04-08 Created: 2015-03-20 Last updated: 2015-04-08Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Gerkmann, Timo
By organisation
Communication Theory
In the same journal
IEEE signal processing magazine (Print)
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 89 hits
ReferencesLink to record
Permanent link

Direct link