Endre søk
Link to record
Permanent link

Direct link
BETA
Publikasjoner (10 av 35) Visa alla publikasjoner
Laskowski, K. & Hjalmarsson, A. (2015). An information-theoretic framework for automated discovery of prosodic cues to conversational structure. In: ICASSP: . Paper presented at ICASSP. IEEE conference proceedings
Åpne denne publikasjonen i ny fane eller vindu >>An information-theoretic framework for automated discovery of prosodic cues to conversational structure
2015 (engelsk)Inngår i: ICASSP, IEEE conference proceedings, 2015Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Interaction timing in conversation exhibits myriad variabilities, yet it is patently not random. However, identifying consistencies is a manually labor-intensive effort, and findings have been limited. We propose a conditonal mutual information measure of the influence of prosodic features, which can be computed for any conversation at any instant, with only a speech/non-speech segmentation as its requirement. We evaluate the methodology on two segmental features: energy and speaking rate. Results indicate that energy, the less controversial of the two, is in fact better on average at predicting conversational structure. We also explore the temporal evolution of model 'surprise', which permits identifying instants where each feature's influence is operative. The method corroborates earlier findings, and appears capable of large-scale data-driven discovery in future research.

sted, utgiver, år, opplag, sider
IEEE conference proceedings, 2015
Serie
Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, ISSN 1520-6149
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-180401 (URN)10.1109/ICASSP.2015.7178998 (DOI)2-s2.0-84946040439 (Scopus ID)978-146736997-8 (ISBN)
Konferanse
ICASSP
Merknad

QC 20160303

Tilgjengelig fra: 2016-01-13 Laget: 2016-01-13 Sist oppdatert: 2018-01-10bibliografisk kontrollert
Skantze, G., Hjalmarsson, A. & Oertel, C. (2014). Turn-taking, feedback and joint attention in situated human-robot interaction. Speech Communication, 65, 50-66
Åpne denne publikasjonen i ny fane eller vindu >>Turn-taking, feedback and joint attention in situated human-robot interaction
2014 (engelsk)Inngår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 65, s. 50-66Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

In this paper, we present a study where a robot instructs a human on how to draw a route on a map. The human and robot are seated face-to-face with the map placed on the table between them. The user's and the robot's gaze can thus serve several simultaneous functions: as cues to joint attention, turn-taking, level of understanding and task progression. We have compared this face-to-face setting with a setting where the robot employs a random gaze behaviour, as well as a voice-only setting where the robot is hidden behind a paper board. In addition to this, we have also manipulated turn-taking cues such as completeness and filled pauses in the robot's speech. By analysing the participants' subjective rating, task completion, verbal responses, gaze behaviour, and drawing activity, we show that the users indeed benefit from the robot's gaze when talking about landmarks, and that the robot's verbal and gaze behaviour has a strong effect on the users' turn-taking behaviour. We also present an analysis of the users' gaze and lexical and prosodic realisation of feedback after the robot instructions, and show that these cues reveal whether the user has yet executed the previous instruction, as well as the user's level of uncertainty.

Emneord
Turn-taking, Feedback, Joint attention, Prosody, Gaze, Uncertainty
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-154366 (URN)10.1016/j.specom.2014.05.005 (DOI)000341901700005 ()2-s2.0-84903625192 (Scopus ID)
Forskningsfinansiär
Swedish Research Council, 2011-6237 2011-6152EU, FP7, Seventh Framework Programme, 288667
Merknad

QC 20141021

Tilgjengelig fra: 2014-10-21 Laget: 2014-10-20 Sist oppdatert: 2018-01-11bibliografisk kontrollert
Skantze, G., Oertel, C. & Hjalmarsson, A. (2014). User Feedback in Human-Robot Dialogue: Task Progression and Uncertainty. In: Proceedings of the HRI Workshop on Timing in Human-Robot Interaction: . Paper presented at the HRI Workshop on Timing in Human-Robot Interaction, Bielefeld, Germany, March 3-6, 2014. Bielefeld, Germany
Åpne denne publikasjonen i ny fane eller vindu >>User Feedback in Human-Robot Dialogue: Task Progression and Uncertainty
2014 (engelsk)Inngår i: Proceedings of the HRI Workshop on Timing in Human-Robot Interaction, Bielefeld, Germany, 2014Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
Bielefeld, Germany: , 2014
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-158183 (URN)
Konferanse
the HRI Workshop on Timing in Human-Robot Interaction, Bielefeld, Germany, March 3-6, 2014
Merknad

QC 20150223

Tilgjengelig fra: 2014-12-30 Laget: 2014-12-30 Sist oppdatert: 2018-01-11bibliografisk kontrollert
Heldner, M., Hjalmarsson, A. & Edlund, J. (2013). Backchannel relevance spaces. In: Eva Liina / Lippus, Pärtel (Ed.), Prosody: Proceedings of the XIth Conference, Tartu 2012. Paper presented at Prosody: Proceedings of XIth Conference (pp. 137-146). Peter Lang Publishing Group
Åpne denne publikasjonen i ny fane eller vindu >>Backchannel relevance spaces
2013 (engelsk)Inngår i: Prosody: Proceedings of the XIth Conference, Tartu 2012 / [ed] Eva Liina / Lippus, Pärtel, Peter Lang Publishing Group, 2013, s. 137-146Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
Peter Lang Publishing Group, 2013
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-137398 (URN)978-3631644270 (ISBN)
Konferanse
Prosody: Proceedings of XIth Conference
Merknad

tmh_import_13_12_13, tmh_id_3870. QC 20140129

Tilgjengelig fra: 2013-12-13 Laget: 2013-12-13 Sist oppdatert: 2018-01-11bibliografisk kontrollert
Skantze, G., Hjalmarsson, A. & Oertel, C. (2013). Exploring the effects of gaze and pauses in situated human-robot interaction. In: 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue: SIGDIAL 2013. Paper presented at 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue SIGdial 2013; Metz, France, 22-24 August, 2013. ACL
Åpne denne publikasjonen i ny fane eller vindu >>Exploring the effects of gaze and pauses in situated human-robot interaction
2013 (engelsk)Inngår i: 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue: SIGDIAL 2013, ACL , 2013Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In this paper, we present a user study where a robot instructs a human on how to draw a route on a map, similar to a Map Task. This setup has allowed us to study user reactions to the robot’s conversational behaviour in order to get a better understanding of how to generate utterances in incremental dialogue systems. We have analysed the participants' subjective rating, task completion, verbal responses, gaze behaviour, drawing activity, and cognitive load. The results show that users utilise the robot’s gaze in order to disambiguate referring expressions and manage the flow of the interaction. Furthermore, we show that the user’s behaviour is affected by how pauses are realised in the robot’s speech.

sted, utgiver, år, opplag, sider
ACL, 2013
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-134932 (URN)2-s2.0-84987866691 (Scopus ID)9781627489874 (ISBN)
Konferanse
14th Annual Meeting of the Special Interest Group on Discourse and Dialogue SIGdial 2013; Metz, France, 22-24 August, 2013
Merknad

QC 20140610

Tilgjengelig fra: 2013-12-02 Laget: 2013-12-02 Sist oppdatert: 2018-01-11bibliografisk kontrollert
Strömbergsson, S., Hjalmarsson, A., Edlund, J. & House, D. (2013). Timing responses to questions in dialogue. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2013: . Paper presented at 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, Lyon, France, 25 August 2013 through 29 August 2013 (pp. 2584-2588). Lyon, France: International Speech and Communication Association
Åpne denne publikasjonen i ny fane eller vindu >>Timing responses to questions in dialogue
2013 (engelsk)Inngår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, Lyon, France: International Speech and Communication Association , 2013, s. 2584-2588Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Questions and answers play an important role in spoken dialogue systems as well as in human-human interaction. A critical concern when responding to a question is the timing of the response. While human response times depend on a wide set of features, dialogue systems generally respond as soon as they can, that is, when the end of the question has been detected and the response is ready to be deployed. This paper presents an analysis of how different semantic and pragmatic features affect the response times to questions in two different data sets of spontaneous human-human dialogues: the Swedish Spontal Corpus and the US English Switchboard corpus. Our analysis shows that contextual features such as question type, response type, and conversation topic influence human response times. Based on these results, we propose that more sophisticated response timing can be achieved in spoken dialogue systems by using these features to automatically and deliberately target system response timing.

sted, utgiver, år, opplag, sider
Lyon, France: International Speech and Communication Association, 2013
Emneord
Question intonation, Response times, Speech prosody, Spontaneous speech
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-137391 (URN)2-s2.0-84906241025 (Scopus ID)
Konferanse
14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, Lyon, France, 25 August 2013 through 29 August 2013
Merknad

QC 20150211

Tilgjengelig fra: 2013-12-13 Laget: 2013-12-13 Sist oppdatert: 2018-01-11bibliografisk kontrollert
Skantze, G. & Hjalmarsson, A. (2013). Towards incremental speech generation in conversational systems. Computer speech & language (Print), 27(1), 243-262
Åpne denne publikasjonen i ny fane eller vindu >>Towards incremental speech generation in conversational systems
2013 (engelsk)Inngår i: Computer speech & language (Print), ISSN 0885-2308, E-ISSN 1095-8363, Vol. 27, nr 1, s. 243-262Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

This paper presents a model of incremental speech generation in practical conversational systems. The model allows a conversational system to incrementally interpret spoken input, while simultaneously planning, realising and self-monitoring the system response. If these processes are time consuming and result in a response delay, the system can automatically produce hesitations to retain the floor. While speaking, the system utilises hidden and overt self-corrections to accommodate revisions in the system. The model has been implemented in a general dialogue system framework. Using this framework, we have implemented a conversational game application. A Wizard-of-Oz experiment is presented, where the automatic speech recognizer is replaced by a Wizard who transcribes the spoken input. In this setting, the incremental model allows the system to start speaking while the user's utterance is being transcribed. In comparison to a non-incremental version of the same system, the incremental version has a shorter response time and is perceived as more efficient by the users.

Emneord
Conversational systems, Incremental processing, Speech generation, Wizard-of-Oz
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-107012 (URN)10.1016/j.csl.2012.05.004 (DOI)000311524800014 ()2-s2.0-84867329282 (Scopus ID)
Forskningsfinansiär
Swedish Research Council, 2011-6237 2011-6152ICT - The Next Generation
Merknad

QC 20130109

Tilgjengelig fra: 2012-12-05 Laget: 2012-12-05 Sist oppdatert: 2018-01-12bibliografisk kontrollert
Skantze, G., Oertel, C. & Hjalmarsson, A. (2013). User feedback in human-robot interaction: Prosody, gaze and timing. In: Proceedings of Interspeech 2013: . Paper presented at 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013; Lyon; France; 25 August 2013 through 29 August 2013 (pp. 1901-1905).
Åpne denne publikasjonen i ny fane eller vindu >>User feedback in human-robot interaction: Prosody, gaze and timing
2013 (engelsk)Inngår i: Proceedings of Interspeech 2013, 2013, s. 1901-1905Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper investigates forms and functions of user feedback in a map task dialogue between a human and a robot, where the robot is the instruction-giver and the human is the instruction- follower. First, we investigate how user acknowledgements in task-oriented dialogue signal whether an activity is about to be initiated or has been completed. The parameters analysed include the users' lexical and prosodic realisation as well as gaze direction and response timing. Second, we investigate the relation between these parameters and the perception of uncertainty.

Serie
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X
Emneord
Feedback, Prosody, Gaze, Human-robot inter-action
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-134933 (URN)2-s2.0-84906244754 (Scopus ID)
Konferanse
14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013; Lyon; France; 25 August 2013 through 29 August 2013
Merknad

QC 20140610

Tilgjengelig fra: 2013-12-02 Laget: 2013-12-02 Sist oppdatert: 2018-01-11bibliografisk kontrollert
Edlund, J., Alexanderson, S., Beskow, J., Gustavsson, L., Heldner, M., Hjalmarsson, A., . . . Marklund, E. (2012). 3rd party observer gaze as a continuous measure of dialogue flow. In: LREC 2012 - Eighth International Conference On Language Resources And Evaluation: . Paper presented at 8th International Conference on Language Resources and Evaluation (LREC),Istanbul, Turkey, May 21-27, 2012 (pp. 1354-1358). Istanbul, Turkey: European Language Resources Association
Åpne denne publikasjonen i ny fane eller vindu >>3rd party observer gaze as a continuous measure of dialogue flow
Vise andre…
2012 (engelsk)Inngår i: LREC 2012 - Eighth International Conference On Language Resources And Evaluation, Istanbul, Turkey: European Language Resources Association, 2012, s. 1354-1358Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

We present an attempt at using 3rd party observer gaze to get a measure of how appropriate each segment in a dialogue is for a speaker change. The method is a step away from the current dependency of speaker turns or talkspurts towards a more general view of speaker changes. We show that 3rd party observers do indeed largely look at the same thing (the speaker), and how this can be captured and utilized to provide insights into human communication. In addition, the results also suggest that there might be differences in the distribution of 3rd party observer gaze depending on how information-rich an utterance is.

sted, utgiver, år, opplag, sider
Istanbul, Turkey: European Language Resources Association, 2012
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-109366 (URN)000323927701073 ()978-2-9517408-7-7 (ISBN)
Konferanse
8th International Conference on Language Resources and Evaluation (LREC),Istanbul, Turkey, May 21-27, 2012
Merknad

QC 20130523

Tilgjengelig fra: 2013-01-02 Laget: 2013-01-02 Sist oppdatert: 2018-01-11bibliografisk kontrollert
Edlund, J., Heldner, M. & Hjalmarsson, A. (2012). 3rd party observer gaze during backchannels. In: Proc. of the Interspeech 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog. Paper presented at the Interspeech 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog. Skamania Lodge, WA, USA
Åpne denne publikasjonen i ny fane eller vindu >>3rd party observer gaze during backchannels
2012 (engelsk)Inngår i: Proc. of the Interspeech 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog, Skamania Lodge, WA, USA, 2012Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper describes a study of how the gazes of 3rd party observers of dialogue move when a speaker is taking the turn and producing a back-channel, respectively. The data is collected and basic processing is complete, but the results section for the paper is not yet in place. It will be in time for the workshop, however, and will be presented there, should this paper outline be accepted..

sted, utgiver, år, opplag, sider
Skamania Lodge, WA, USA: , 2012
Emneord
speech synthesis, unit selection, joint costs
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-107010 (URN)
Konferanse
the Interspeech 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog
Merknad

tmh_import_12_12_05, tmh_id_3781, QC 20121217

Tilgjengelig fra: 2012-12-05 Laget: 2012-12-05 Sist oppdatert: 2018-01-12bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0003-3585-8077