kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 35) Show all publications
Laskowski, K. & Hjalmarsson, A. (2015). An information-theoretic framework for automated discovery of prosodic cues to conversational structure. In: ICASSP: . Paper presented at ICASSP. IEEE conference proceedings
Open this publication in new window or tab >>An information-theoretic framework for automated discovery of prosodic cues to conversational structure
2015 (English)In: ICASSP, IEEE conference proceedings, 2015Conference paper, Published paper (Refereed)
Abstract [en]

Interaction timing in conversation exhibits myriad variabilities, yet it is patently not random. However, identifying consistencies is a manually labor-intensive effort, and findings have been limited. We propose a conditonal mutual information measure of the influence of prosodic features, which can be computed for any conversation at any instant, with only a speech/non-speech segmentation as its requirement. We evaluate the methodology on two segmental features: energy and speaking rate. Results indicate that energy, the less controversial of the two, is in fact better on average at predicting conversational structure. We also explore the temporal evolution of model 'surprise', which permits identifying instants where each feature's influence is operative. The method corroborates earlier findings, and appears capable of large-scale data-driven discovery in future research.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2015
Series
Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, ISSN 1520-6149
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180401 (URN)10.1109/ICASSP.2015.7178998 (DOI)000427402905099 ()2-s2.0-84946040439 (Scopus ID)9781467369978 (ISBN)
Conference
ICASSP
Note

QC 20160303

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2022-06-23Bibliographically approved
Skantze, G., Hjalmarsson, A. & Oertel, C. (2014). Turn-taking, feedback and joint attention in situated human-robot interaction. Speech Communication, 65, 50-66
Open this publication in new window or tab >>Turn-taking, feedback and joint attention in situated human-robot interaction
2014 (English)In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 65, p. 50-66Article in journal (Refereed) Published
Abstract [en]

In this paper, we present a study where a robot instructs a human on how to draw a route on a map. The human and robot are seated face-to-face with the map placed on the table between them. The user's and the robot's gaze can thus serve several simultaneous functions: as cues to joint attention, turn-taking, level of understanding and task progression. We have compared this face-to-face setting with a setting where the robot employs a random gaze behaviour, as well as a voice-only setting where the robot is hidden behind a paper board. In addition to this, we have also manipulated turn-taking cues such as completeness and filled pauses in the robot's speech. By analysing the participants' subjective rating, task completion, verbal responses, gaze behaviour, and drawing activity, we show that the users indeed benefit from the robot's gaze when talking about landmarks, and that the robot's verbal and gaze behaviour has a strong effect on the users' turn-taking behaviour. We also present an analysis of the users' gaze and lexical and prosodic realisation of feedback after the robot instructions, and show that these cues reveal whether the user has yet executed the previous instruction, as well as the user's level of uncertainty.

Keywords
Turn-taking, Feedback, Joint attention, Prosody, Gaze, Uncertainty
National Category
Other Computer and Information Science
Identifiers
urn:nbn:se:kth:diva-154366 (URN)10.1016/j.specom.2014.05.005 (DOI)000341901700005 ()2-s2.0-84903625192 (Scopus ID)
Funder
Swedish Research Council, 2011-6237 2011-6152EU, FP7, Seventh Framework Programme, 288667
Note

QC 20141021

Available from: 2014-10-21 Created: 2014-10-20 Last updated: 2024-03-15Bibliographically approved
Skantze, G., Oertel, C. & Hjalmarsson, A. (2014). User Feedback in Human-Robot Dialogue: Task Progression and Uncertainty. In: Proceedings of the HRI Workshop on Timing in Human-Robot Interaction: . Paper presented at the HRI Workshop on Timing in Human-Robot Interaction, Bielefeld, Germany, March 3-6, 2014. Bielefeld, Germany
Open this publication in new window or tab >>User Feedback in Human-Robot Dialogue: Task Progression and Uncertainty
2014 (English)In: Proceedings of the HRI Workshop on Timing in Human-Robot Interaction, Bielefeld, Germany, 2014Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Bielefeld, Germany: , 2014
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-158183 (URN)
Conference
the HRI Workshop on Timing in Human-Robot Interaction, Bielefeld, Germany, March 3-6, 2014
Note

QC 20150223

Available from: 2014-12-30 Created: 2014-12-30 Last updated: 2024-03-15Bibliographically approved
Heldner, M., Hjalmarsson, A. & Edlund, J. (2013). Backchannel relevance spaces. In: Eva Liina / Lippus, Pärtel (Ed.), Nordic Prosody: Proceedings of the XIth Conference, Tartu 2012. Paper presented at The XIth Nordic Prosody Conference (pp. 137-146). Franktfurt am Main, Germany: Peter Lang Publishing Group
Open this publication in new window or tab >>Backchannel relevance spaces
2013 (English)In: Nordic Prosody: Proceedings of the XIth Conference, Tartu 2012 / [ed] Eva Liina / Lippus, Pärtel, Franktfurt am Main, Germany: Peter Lang Publishing Group, 2013, p. 137-146Conference paper, Published paper (Refereed)
Abstract [en]

This contribution introduces backchannel relevance spaces – intervals where it is relevant fora listener in a conversation to produce a backchannel. By annotating and comparing actualvisual and vocal backchannels with potential backchannels established using a group of subjectsacting as third-party listeners, we show (i) that visual only backchannels represent a substantialproportion of all backchannels; and (ii) that there are more opportunities for backchannels(i.e. potential backchannels or backchannel relevance spaces) than there are actualvocal and visual backchannels. These findings indicate that backchannel relevance spacesenable more accurate acoustic, prosodic, lexical (et cetera) descriptions of backchannel invitingcues than descriptions based on the context of actual vocal backchannels only.

Place, publisher, year, edition, pages
Franktfurt am Main, Germany: Peter Lang Publishing Group, 2013
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-137398 (URN)10.3726/978-3-653-03047-1 (DOI)978-3-653-03047-1 (ISBN)
Conference
The XIth Nordic Prosody Conference
Note

QC 20210804

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2022-06-23Bibliographically approved
Skantze, G., Hjalmarsson, A. & Oertel, C. (2013). Exploring the effects of gaze and pauses in situated human-robot interaction. In: 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue: SIGDIAL 2013. Paper presented at 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue SIGdial 2013; Metz, France, 22-24 August, 2013. ACL
Open this publication in new window or tab >>Exploring the effects of gaze and pauses in situated human-robot interaction
2013 (English)In: 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue: SIGDIAL 2013, ACL , 2013Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we present a user study where a robot instructs a human on how to draw a route on a map, similar to a Map Task. This setup has allowed us to study user reactions to the robot’s conversational behaviour in order to get a better understanding of how to generate utterances in incremental dialogue systems. We have analysed the participants' subjective rating, task completion, verbal responses, gaze behaviour, drawing activity, and cognitive load. The results show that users utilise the robot’s gaze in order to disambiguate referring expressions and manage the flow of the interaction. Furthermore, we show that the user’s behaviour is affected by how pauses are realised in the robot’s speech.

Place, publisher, year, edition, pages
ACL, 2013
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-134932 (URN)2-s2.0-84987866691 (Scopus ID)9781627489874 (ISBN)
Conference
14th Annual Meeting of the Special Interest Group on Discourse and Dialogue SIGdial 2013; Metz, France, 22-24 August, 2013
Note

QC 20140610

Available from: 2013-12-02 Created: 2013-12-02 Last updated: 2024-03-15Bibliographically approved
Strömbergsson, S., Hjalmarsson, A., Edlund, J. & House, D. (2013). Timing responses to questions in dialogue. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2013: . Paper presented at 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, Lyon, France, 25 August 2013 through 29 August 2013 (pp. 2583-2587). Lyon, France: International Speech and Communication Association
Open this publication in new window or tab >>Timing responses to questions in dialogue
2013 (English)In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, Lyon, France: International Speech and Communication Association , 2013, p. 2583-2587Conference paper, Published paper (Refereed)
Abstract [en]

Questions and answers play an important role in spoken dialogue systems as well as in human-human interaction. A critical concern when responding to a question is the timing of the response. While human response times depend on a wide set of features, dialogue systems generally respond as soon as they can, that is, when the end of the question has been detected and the response is ready to be deployed. This paper presents an analysis of how different semantic and pragmatic features affect the response times to questions in two different data sets of spontaneous human-human dialogues: the Swedish Spontal Corpus and the US English Switchboard corpus. Our analysis shows that contextual features such as question type, response type, and conversation topic influence human response times. Based on these results, we propose that more sophisticated response timing can be achieved in spoken dialogue systems by using these features to automatically and deliberately target system response timing.

Place, publisher, year, edition, pages
Lyon, France: International Speech and Communication Association, 2013
Keywords
Question intonation, Response times, Speech prosody, Spontaneous speech
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-137391 (URN)000395050001055 ()2-s2.0-84906241025 (Scopus ID)
Conference
14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, Lyon, France, 25 August 2013 through 29 August 2013
Note

QC 20150211

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2022-06-23Bibliographically approved
Skantze, G. & Hjalmarsson, A. (2013). Towards incremental speech generation in conversational systems. Computer speech & language (Print), 27(1), 243-262
Open this publication in new window or tab >>Towards incremental speech generation in conversational systems
2013 (English)In: Computer speech & language (Print), ISSN 0885-2308, E-ISSN 1095-8363, Vol. 27, no 1, p. 243-262Article in journal (Refereed) Published
Abstract [en]

This paper presents a model of incremental speech generation in practical conversational systems. The model allows a conversational system to incrementally interpret spoken input, while simultaneously planning, realising and self-monitoring the system response. If these processes are time consuming and result in a response delay, the system can automatically produce hesitations to retain the floor. While speaking, the system utilises hidden and overt self-corrections to accommodate revisions in the system. The model has been implemented in a general dialogue system framework. Using this framework, we have implemented a conversational game application. A Wizard-of-Oz experiment is presented, where the automatic speech recognizer is replaced by a Wizard who transcribes the spoken input. In this setting, the incremental model allows the system to start speaking while the user's utterance is being transcribed. In comparison to a non-incremental version of the same system, the incremental version has a shorter response time and is perceived as more efficient by the users.

Keywords
Conversational systems, Incremental processing, Speech generation, Wizard-of-Oz
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-107012 (URN)10.1016/j.csl.2012.05.004 (DOI)000311524800014 ()2-s2.0-84867329282 (Scopus ID)
Funder
Swedish Research Council, 2011-6237 2011-6152ICT - The Next Generation
Note

QC 20130109

Available from: 2012-12-05 Created: 2012-12-05 Last updated: 2022-06-24Bibliographically approved
Skantze, G., Oertel, C. & Hjalmarsson, A. (2013). User feedback in human-robot interaction: Prosody, gaze and timing. In: Proceedings of Interspeech 2013: . Paper presented at 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013; Lyon; France; 25 August 2013 through 29 August 2013 (pp. 1901-1905).
Open this publication in new window or tab >>User feedback in human-robot interaction: Prosody, gaze and timing
2013 (English)In: Proceedings of Interspeech 2013, 2013, p. 1901-1905Conference paper, Published paper (Refereed)
Abstract [en]

This paper investigates forms and functions of user feedback in a map task dialogue between a human and a robot, where the robot is the instruction-giver and the human is the instruction- follower. First, we investigate how user acknowledgements in task-oriented dialogue signal whether an activity is about to be initiated or has been completed. The parameters analysed include the users' lexical and prosodic realisation as well as gaze direction and response timing. Second, we investigate the relation between these parameters and the perception of uncertainty.

Series
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X
Keywords
Feedback, Prosody, Gaze, Human-robot inter-action
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-134933 (URN)000395050000401 ()2-s2.0-84906244754 (Scopus ID)
Conference
14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013; Lyon; France; 25 August 2013 through 29 August 2013
Note

QC 20140610

Available from: 2013-12-02 Created: 2013-12-02 Last updated: 2024-03-15Bibliographically approved
Edlund, J., Alexanderson, S., Beskow, J., Gustavsson, L., Heldner, M., Hjalmarsson, A., . . . Marklund, E. (2012). 3rd party observer gaze as a continuous measure of dialogue flow. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012: . Paper presented at 8th International Conference on Language Resources and Evaluation (LREC),Istanbul, Turkey, May 21-27, 2012 (pp. 1354-1358). Istanbul, Turkey: European Language Resources Association
Open this publication in new window or tab >>3rd party observer gaze as a continuous measure of dialogue flow
Show others...
2012 (English)In: Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey: European Language Resources Association, 2012, p. 1354-1358Conference paper, Published paper (Refereed)
Abstract [en]

We present an attempt at using 3rd party observer gaze to get a measure of how appropriate each segment in a dialogue is for a speaker change. The method is a step away from the current dependency of speaker turns or talkspurts towards a more general view of speaker changes. We show that 3rd party observers do indeed largely look at the same thing (the speaker), and how this can be captured and utilized to provide insights into human communication. In addition, the results also suggest that there might be differences in the distribution of 3rd party observer gaze depending on how information-rich an utterance is.

Place, publisher, year, edition, pages
Istanbul, Turkey: European Language Resources Association, 2012
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-109366 (URN)000323927701073 ()2-s2.0-84987651578 (Scopus ID)
Conference
8th International Conference on Language Resources and Evaluation (LREC),Istanbul, Turkey, May 21-27, 2012
Note

QC 20220627

Part of proceedings: ISBN 978-2-9517408-7-7

Available from: 2013-01-02 Created: 2013-01-02 Last updated: 2022-06-27Bibliographically approved
Edlund, J., Heldner, M. & Hjalmarsson, A. (2012). 3rd party observer gaze during backchannels. In: Proc. of the Interspeech 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog. Paper presented at the Interspeech 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog. Skamania Lodge, WA, USA
Open this publication in new window or tab >>3rd party observer gaze during backchannels
2012 (English)In: Proc. of the Interspeech 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog, Skamania Lodge, WA, USA, 2012Conference paper, Published paper (Refereed)
Abstract [en]

This paper describes a study of how the gazes of 3rd party observers of dialogue move when a speaker is taking the turn and producing a back-channel, respectively. The data is collected and basic processing is complete, but the results section for the paper is not yet in place. It will be in time for the workshop, however, and will be presented there, should this paper outline be accepted..

Place, publisher, year, edition, pages
Skamania Lodge, WA, USA: , 2012
Keywords
speech synthesis, unit selection, joint costs
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-107010 (URN)
Conference
the Interspeech 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog
Note

tmh_import_12_12_05, tmh_id_3781, QC 20121217

Available from: 2012-12-05 Created: 2012-12-05 Last updated: 2022-06-24Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3585-8077

Search in DiVA

Show all publications