Classification of Affective Speech using Normalized Time-Frequency Cepstra
2010 (English)In: Speech Prosody 2010 Conference Proceedings, Chicago, Illinois, U.S.A, 2010Conference paper (Refereed)
Subtle temporal and spectral differences between categorical realizations of para-linguistic phenomena (e.g. affective vocal expressions), are hard to capture and describe. In this paper we present a signal representation based on Time Varying Constant-Q Cepstral Coefficients (TVCQCC) derived for this purpose. A method which utilize the special properties of the constant Q-transform for mean F0 estimation and normalization is described. The coefficients are invariant to utterance length, and as a special case, a representation for prosody is considered.Speaker independent classification results using nu-SVMthe the Berlin EMO-DB and two closed sets of basic (anger, disgust, fear, happiness, sadness, neutral) and social/interpersonal (affection, pride, shame) emotions recorded by forty professional actors from two English dialect areas are reported. The accuracy for the Berlin EMO-DB is 71.2 %, and the accuracies for the first set including basic emotions was 44.6% and for the second set including basic and social emotions the accuracy was31.7% . It was found that F0 normalization boosts the performance and a combined feature set shows the best performance.
Place, publisher, year, edition, pages
Chicago, Illinois, U.S.A, 2010.
Computer Science Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:kth:diva-52035ISBN: 978-0-557-51931-6OAI: oai:DiVA.org:kth-52035DiVA: diva2:465329
Fifth International Conference on Speech Prosody (Speech Prosody 2010), Chicago, 11-14 may 2010
tmh_import_11_12_14 QC 201112192011-12-142011-12-142011-12-19Bibliographically approved