Open this publication in new window or tab >>2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]
Parallel with the orthographic streams of words in conversation are multiple layered epiphenomena, short in duration and with a communicativepurpose. These paralinguistic events regulate the interaction flow via gaze,gestures and intonation. This thesis focus on how to compute, model, discoverand analyze prosody and it’s applications for spoken dialog systems.Specifically it addresses automatic classification and analysis of conversationalcues related to turn-taking, brief feedback, affective expressions, their crossrelationshipsas well as their cognitive and neurological basis. Techniques areproposed for instantaneous and suprasegmental parameterization of scalarand vector valued representations of fundamental frequency, but also intensity and voice quality. Examples are given for how to engineer supervised learned automata’s for off-line processing of conversational corpora as well as for incremental on-line processing with low-latency constraints suitable as detector modules in a responsive social interface. Specific attention is given to the communicative functions of vocal feedback like "mhm", "okay" and "yeah, that’s right" as postulated by the theories of grounding, emotion and a survey on laymen opinions. The potential functions and their prosodic cues are investigated via automatic decoding, data-mining, exploratory visualization and descriptive measurements.
Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2012. p. xiv, 86
Series
Trita-CSC-A, ISSN 1653-5723 ; 2012:08
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-102335 (URN)978-91-7501-467-8 (ISBN)
Public defence
2012-09-28, Sal F3, Lindstedtsvägen 26, KTH, Stockholm, 13:00 (English)
Opponent
Supervisors
Note
QC 20120914
2012-09-142012-09-142022-06-24Bibliographically approved