Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Perceptual and Squared Error Aspects in Speech and Audio Coding
KTH, Superseded Departments, Signals, Sensors and Systems.
2004 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

In the process of quantization, speech and audio signals are changed. This thesis contains four papers concerned with comparing and minimizing different measures to quantify the changes introduced. Before quantization the signal can be transformed to another domain. Transforms related to the discrete Fourier transform allow for e.cient quantization. The complex coe.cients from these transforms are typically represented by their amplitudes and phases. Papers A and B describe a new method to quantize the amplitudes and phases with scalar polar quantizers. The method is denoted as multi-variate block polar quantization (MBPQ) and is optimized to minimize the average weighted squared error, utilizing high-rate derivations. It is shown that MBPQ outperforms other polar quantizers for all types of data considered. While the perceptual importance of the amplitude values is well established, the perceptual importance of the phase values is still discussed. In paper C, two distortion measures quantifying the detectability of phase distortions are found and veri.ed. Utilizing these distortion measures, it is investigated how well the squared error describes the perception of phase distortions. It was found that the average perceptual distortion reduces only moderately with increasing rate for both vector quantizers minimizing a weighted squared error and vector quantizers minimizing a perceptual distortion measure. It is concluded that future research should focus on .nding parameters that describe the features contained in phase and lead to more e.cient quantization. Paper D investigates perceptual distortion measures in the most commonly used coder paradigm for speech coding: linear-prediction-based analysis-by-synthesis. In the paper, an auditory model based distortion measure is compared to a commonly used weighted squared error derived from the linear prediction coe.cients. It is concluded that sophisticated auditory models are rarely used in this type of coders due to the good performance of the commonly used weighted squared error.

Place, publisher, year, edition, pages
Stockholm: Signaler, sensorer och system , 2004. , xii, 45 p.
Series
Trita-S3-SIP, ISSN 1652-4500 ; 2004:5
Keyword [en]
Signalbehandling, speech coding, audio coding, auditory models, perceptual distortion measures, squared error
Keyword [sv]
Signalbehandling
National Category
Signal Processing
Identifiers
URN: urn:nbn:se:kth:diva-82ISBN: 91-628-6173-5 (print)OAI: oai:DiVA.org:kth-82DiVA: diva2:14802
Public defence
2004-12-15, kollegiesalen, Valhallavägen 79, Stockholm, 14:00
Opponent
Supervisors
Available from: 2004-12-21 Created: 2004-12-21 Last updated: 2012-03-21

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Pobloth, Harald
By organisation
Signals, Sensors and Systems
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 313 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf