Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Classification of Affective Speech using Normalized Time-Frequency Cepstra
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
2010 (English)In: Speech Prosody 2010 Conference Proceedings, Chicago, Illinois, U.S.A, 2010Conference paper, Published paper (Refereed)
Abstract [en]

Subtle temporal and spectral differences between categorical realizations of para-linguistic phenomena (e.g. affective vocal expressions), are hard to capture and describe. In this paper we present a signal representation based on Time Varying Constant-Q Cepstral Coefficients (TVCQCC) derived for this purpose. A method which utilize the special properties of the constant Q-transform for mean F0 estimation and normalization is described. The coefficients are invariant to utterance length, and as a special case, a representation for prosody is considered.Speaker independent classification results using nu-SVMthe the Berlin EMO-DB and two closed sets of basic (anger, disgust, fear, happiness, sadness, neutral) and social/interpersonal (affection, pride, shame) emotions recorded by forty professional actors from two English dialect areas are reported. The accuracy for the Berlin EMO-DB is 71.2 %, and the accuracies for the first set including basic emotions was 44.6% and for the second set including basic and social emotions the accuracy was31.7% . It was found that F0 normalization boosts the performance and a combined feature set shows the best performance.

Place, publisher, year, edition, pages
Chicago, Illinois, U.S.A, 2010.
National Category
Computer Science Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-52035ISBN: 978-0-557-51931-6 (print)OAI: oai:DiVA.org:kth-52035DiVA: diva2:465329
Conference
Fifth International Conference on Speech Prosody (Speech Prosody 2010), Chicago, 11-14 may 2010
Note
tmh_import_11_12_14 QC 20111219Available from: 2011-12-14 Created: 2011-12-14 Last updated: 2011-12-19Bibliographically approved

Open Access in DiVA

No full text

Other links

speechprosody2010.illinois.edu

Search in DiVA

By author/editor
Neiberg, DanielAnanthakrishnan, Gopal
By organisation
Speech Communication and TechnologyCentre for Speech Technology, CTT
Computer ScienceLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 41 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf