Emotion Recognition in Spontaneous Speech Using GMMs
2006 (English)In: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, 809-812 p.Conference paper (Refereed)
Automatic detection of emotions has been evaluated using standard Mel-frequency Cepstral Coefficients, MFCCs, and a variant, MFCC-low, calculated between 20 and 300 Hz, in order to model pitch. Also plain pitch features have been used. These acoustic features have all been modeled by Gaussian mixture models, GMMs, on the frame level. The method has been tested on two different corpora and languages; Swedish voice controlled telephone services and English meetings. The results indicate that using GMMs on the frame level is a feasible technique for emotion classification. The two MFCC methods have similar performance, and MFCC-low outperforms the pitch features. Combining the three classifiers significantly improves performance.
Place, publisher, year, edition, pages
BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006. 809-812 p.
Classification (of information), Continuous speech recognition, Face recognition, Linguistics, Multitasking, Query languages, Standards, Telecommunication systems, Telephone systems, Acoustic features, Automatic detection, Corpora (CO), Emotion classification, emotion recognition, Gaussian mixture model (GMM), In order, international conferences, Languages (traditional), Mel-frequency cepstral coefficients (MFCCs), Model pitch, Spoken language processing, Spontaneous speech, telephone services
Computer and Information Science General Language Studies and Linguistics
IdentifiersURN: urn:nbn:se:kth:diva-30676ISI: 000269965900203ScopusID: 2-s2.0-38749103707OAI: oai:DiVA.org:kth-30676DiVA: diva2:403032
9th International Conference on Spoken Language Processing/INTERSPEECH 2006, Pittsburgh, PA, 2006
QC 201103102011-03-102011-03-042011-03-10Bibliographically approved