Åpne denne publikasjonen i ny fane eller vindu >>2012 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]
Parallel with the orthographic streams of words in conversation are multiple layered epiphenomena, short in duration and with a communicativepurpose. These paralinguistic events regulate the interaction flow via gaze,gestures and intonation. This thesis focus on how to compute, model, discoverand analyze prosody and it’s applications for spoken dialog systems.Specifically it addresses automatic classification and analysis of conversationalcues related to turn-taking, brief feedback, affective expressions, their crossrelationshipsas well as their cognitive and neurological basis. Techniques areproposed for instantaneous and suprasegmental parameterization of scalarand vector valued representations of fundamental frequency, but also intensity and voice quality. Examples are given for how to engineer supervised learned automata’s for off-line processing of conversational corpora as well as for incremental on-line processing with low-latency constraints suitable as detector modules in a responsive social interface. Specific attention is given to the communicative functions of vocal feedback like "mhm", "okay" and "yeah, that’s right" as postulated by the theories of grounding, emotion and a survey on laymen opinions. The potential functions and their prosodic cues are investigated via automatic decoding, data-mining, exploratory visualization and descriptive measurements.
sted, utgiver, år, opplag, sider
Stockholm: KTH Royal Institute of Technology, 2012. s. xiv, 86
Serie
Trita-CSC-A, ISSN 1653-5723 ; 2012:08
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-102335 (URN)978-91-7501-467-8 (ISBN)
Disputas
2012-09-28, Sal F3, Lindstedtsvägen 26, KTH, Stockholm, 13:00 (engelsk)
Opponent
Veileder
Merknad
QC 20120914
2012-09-142012-09-142022-06-24bibliografisk kontrollert