Change search
ReferencesLink to record
Permanent link

Direct link
Prosodic Characterization and Automatic Classification of Conversational Grunts in Swedish
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-0397-6442
2010 (English)In: Working Papers 54: Proceedings from Fonetik 2010, 2010Conference paper (Other academic)
Abstract [en]

Conversation is the most common use of speech. Any automatic dialog system, pretending to mimic a human, must be able to successfully detect typical sounds and meanings of spontaneous conversational speech. Automatic transcription of the function of linguistic units, sometimes refereed to as Dialog Acts (DAs), Cue Phrases or Discourse Markers is an emerging area of research. This can be done on a pure lexical level, or by using prosody alone (Laskowski and Shriberg, 2010; Goto et al., 1999), or a combination of thereof (Sridhar et al., 2009; Gravano et al., 2007). However, it is not straightforward to train a language model for non-verbal content (e.g. “mm”, “mhm”, “eh”, “em”), not only since it is questionable if these sounds are words, but also because of lack of standardized annotation schemes. Ward (2000) refer to these tokens as conversational grunts, which is also the scope of this study. Feedback tokens are usually sub-divided into yes/no answers, backchannels and acknowledgments. In this study, it is the attitude of the response which is the focus of interest. Thus, the cut is instead made between dis-preference, news receiving and general feedback. These are further subdivided into their turn-taking effect: Other speaker, Same speaker and Simultaneous start. This allows us to verify if conversational grunts are simply carriers of prosodic information. In this study, we use a supra-segmental prosodic signal representation based on Time Varying Constant-Q Cepstral Coefficients (TVCQCC) introduced in (Neiberg et al., 2010), for classification and intuitive visualization of feedback and fillers. The contribution of the end of interlocutor left context for predicting turn taking effect has been studied for a while (Duncan, 1972) and is also addressed in this study. In addition, we examine the effect of contextual timing features, which has been shown to be useful in DAs recognition (Laskowski and Shriberg, 2010). We use the Swedish DEAL corpus which has annotated fillers and feedback attitudes. Classification results using linear discriminant analysis are presented. It was found that feedbacks followed by a clean floor taking lose some of their prosodic cues which signal attitude compared to a clean continuer feedback. Turn taking effects can be predicted well over chance level, while Simultaneous Start can’t be predicted at all. However, feedback tokens before Simultaneous Starts were found to be more equal feedback continuers than turn initial feedback tokens, which may be explained as inappropriate floor stealing attempts from the feedback producing speaker. An analysis based on the prototypical spectrograms closely follows the results for Bad News (Dispreference) vs Good news (News reciving) found in Freese and Maynard (1998) although the defnitions differ slightly.

Place, publisher, year, edition, pages
National Category
Computer Science Language Technology (Computational Linguistics)
URN: urn:nbn:se:kth:diva-52166OAI: diva2:465461
Fonetik 2010. Lund, Sweden. June 2–4, 2010
tmh_import_11_12_14. QC 20111222Available from: 2011-12-14 Created: 2011-12-14 Last updated: 2011-12-22Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Neiberg, DanielGustafson, Joakim
By organisation
Speech Communication and Technology
Computer ScienceLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 13 hits
ReferencesLink to record
Permanent link

Direct link