Change search
ReferencesLink to record
Permanent link

Direct link
Perceptual and Squared Error Aspects in Speech and Audio Coding
KTH, Superseded Departments, Signals, Sensors and Systems.
2004 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

In the process of quantization, speech and audio signals are changed. This thesis contains four papers concerned with comparing and minimizing different measures to quantify the changes introduced. Before quantization the signal can be transformed to another domain. Transforms related to the discrete Fourier transform allow for e.cient quantization. The complex coe.cients from these transforms are typically represented by their amplitudes and phases. Papers A and B describe a new method to quantize the amplitudes and phases with scalar polar quantizers. The method is denoted as multi-variate block polar quantization (MBPQ) and is optimized to minimize the average weighted squared error, utilizing high-rate derivations. It is shown that MBPQ outperforms other polar quantizers for all types of data considered. While the perceptual importance of the amplitude values is well established, the perceptual importance of the phase values is still discussed. In paper C, two distortion measures quantifying the detectability of phase distortions are found and veri.ed. Utilizing these distortion measures, it is investigated how well the squared error describes the perception of phase distortions. It was found that the average perceptual distortion reduces only moderately with increasing rate for both vector quantizers minimizing a weighted squared error and vector quantizers minimizing a perceptual distortion measure. It is concluded that future research should focus on .nding parameters that describe the features contained in phase and lead to more e.cient quantization. Paper D investigates perceptual distortion measures in the most commonly used coder paradigm for speech coding: linear-prediction-based analysis-by-synthesis. In the paper, an auditory model based distortion measure is compared to a commonly used weighted squared error derived from the linear prediction coe.cients. It is concluded that sophisticated auditory models are rarely used in this type of coders due to the good performance of the commonly used weighted squared error.

Place, publisher, year, edition, pages
Stockholm: Signaler, sensorer och system , 2004. , xii, 45 p.
Trita-S3-SIP, ISSN 1652-4500 ; 2004:5
Keyword [en]
Signalbehandling, speech coding, audio coding, auditory models, perceptual distortion measures, squared error
Keyword [sv]
National Category
Signal Processing
URN: urn:nbn:se:kth:diva-82ISBN: 91-628-6173-5OAI: diva2:14802
Public defence
2004-12-15, kollegiesalen, Valhallavägen 79, Stockholm, 14:00
Available from: 2004-12-21 Created: 2004-12-21 Last updated: 2012-03-21

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Pobloth, Harald
By organisation
Signals, Sensors and Systems
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 295 hits
ReferencesLink to record
Permanent link

Direct link