Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Semi-supervised methods for exploring the acoustics of simple productive feedback
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. (Tal)
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. (Tal)ORCID iD: 0000-0002-3323-5311
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. (Tal)ORCID iD: 0000-0002-0397-6442
2013 (English)In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 55, no 3, 451-469 p.Article in journal (Refereed) Published
Abstract [en]

This paper proposes methods for exploring acoustic correlates to feedback functions. A sub-language of Swedish, simple productive feedback, is introduced to facilitate investigations of the functional contributions of base tokens, phonological operations and prosody. The function of feedback is to convey the listeners' attention, understanding and affective states. In order to handle the large number of possible affective states, the current study starts by performing a listening experiment where humans annotated the functional similarity of feedback tokens with different prosodic realizations. By selecting a set of stimuli that had different prosodic distances from a reference token, it was possible to compute a generalised functional distance measure. The resulting generalised functional distance measure showed to be correlated to prosodic distance but the correlations varied as a function of base tokens and phonological operations. In a subsequent listening test, a small representative sample of feedback tokens were rated for understanding, agreement, interest, surprise and certainty. These ratings were found to explain a significant proportion of the generalised functional distance. By combining the acoustic analysis with an explorative visualisation of the prosody, we have established a map between human perception of similarity between feedback tokens, their measured distance in acoustic space, and the link to the perception of the function of feedback tokens with varying realisations.

Place, publisher, year, edition, pages
2013. Vol. 55, no 3, 451-469 p.
Keyword [en]
social signal processing, affective annotation, feedback modeling, grounding 2000 MSC
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-102334DOI: 10.1016/j.specom.2012.12.007ISI: 000316837000005Scopus ID: 2-s2.0-84875460872OAI: oai:DiVA.org:kth-102334DiVA: diva2:552375
Projects
SAMSYNTIURO
Funder
Swedish Research Council, 2009-4291EU, European Research Council, FP7 – 248314
Note

QC 20130508

Available from: 2012-09-14 Created: 2012-09-14 Last updated: 2017-12-07Bibliographically approved
In thesis
1. Modelling Paralinguistic Conversational Interaction: Towards social awareness in spoken human-machine dialogue
Open this publication in new window or tab >>Modelling Paralinguistic Conversational Interaction: Towards social awareness in spoken human-machine dialogue
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Parallel with the orthographic streams of words in conversation are multiple layered epiphenomena, short in duration and with a communicativepurpose. These paralinguistic events regulate the interaction flow via gaze,gestures and intonation. This thesis focus on how to compute, model, discoverand analyze prosody and it’s applications for spoken dialog systems.Specifically it addresses automatic classification and analysis of conversationalcues related to turn-taking, brief feedback, affective expressions, their crossrelationshipsas well as their cognitive and neurological basis. Techniques areproposed for instantaneous and suprasegmental parameterization of scalarand vector valued representations of fundamental frequency, but also intensity and voice quality. Examples are given for how to engineer supervised learned automata’s for off-line processing of conversational corpora as well as for incremental on-line processing with low-latency constraints suitable as detector modules in a responsive social interface. Specific attention is given to the communicative functions of vocal feedback like "mhm", "okay" and "yeah, that’s right" as postulated by the theories of grounding, emotion and a survey on laymen opinions. The potential functions and their prosodic cues are investigated via automatic decoding, data-mining, exploratory visualization and descriptive measurements.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2012. xiv, 86 p.
Series
Trita-CSC-A, ISSN 1653-5723 ; 2012:08
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-102335 (URN)978-91-7501-467-8 (ISBN)
Public defence
2012-09-28, Sal F3, Lindstedtsvägen 26, KTH, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20120914

Available from: 2012-09-14 Created: 2012-09-14 Last updated: 2012-09-14Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Authority records BETA

Salvi, GiampieroGustafson, Joakim

Search in DiVA

By author/editor
Neiberg, DanielSalvi, GiampieroGustafson, Joakim
By organisation
Speech Communication and Technology
In the same journal
Speech Communication
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 96 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf