Ändra sökning
Länk till posten
Permanent länk

Direktlänk
BETA
Publikationer (10 of 35) Visa alla publikationer
Laskowski, K. & Hjalmarsson, A. (2015). An information-theoretic framework for automated discovery of prosodic cues to conversational structure. In: ICASSP: . Paper presented at ICASSP. IEEE conference proceedings
Öppna denna publikation i ny flik eller fönster >>An information-theoretic framework for automated discovery of prosodic cues to conversational structure
2015 (Engelska)Ingår i: ICASSP, IEEE conference proceedings, 2015Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Interaction timing in conversation exhibits myriad variabilities, yet it is patently not random. However, identifying consistencies is a manually labor-intensive effort, and findings have been limited. We propose a conditonal mutual information measure of the influence of prosodic features, which can be computed for any conversation at any instant, with only a speech/non-speech segmentation as its requirement. We evaluate the methodology on two segmental features: energy and speaking rate. Results indicate that energy, the less controversial of the two, is in fact better on average at predicting conversational structure. We also explore the temporal evolution of model 'surprise', which permits identifying instants where each feature's influence is operative. The method corroborates earlier findings, and appears capable of large-scale data-driven discovery in future research.

Ort, förlag, år, upplaga, sidor
IEEE conference proceedings, 2015
Serie
Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, ISSN 1520-6149
Nationell ämneskategori
Datavetenskap (datalogi) Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
urn:nbn:se:kth:diva-180401 (URN)10.1109/ICASSP.2015.7178998 (DOI)2-s2.0-84946040439 (Scopus ID)978-146736997-8 (ISBN)
Konferens
ICASSP
Anmärkning

QC 20160303

Tillgänglig från: 2016-01-13 Skapad: 2016-01-13 Senast uppdaterad: 2018-01-10Bibliografiskt granskad
Skantze, G., Hjalmarsson, A. & Oertel, C. (2014). Turn-taking, feedback and joint attention in situated human-robot interaction. Speech Communication, 65, 50-66
Öppna denna publikation i ny flik eller fönster >>Turn-taking, feedback and joint attention in situated human-robot interaction
2014 (Engelska)Ingår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 65, s. 50-66Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

In this paper, we present a study where a robot instructs a human on how to draw a route on a map. The human and robot are seated face-to-face with the map placed on the table between them. The user's and the robot's gaze can thus serve several simultaneous functions: as cues to joint attention, turn-taking, level of understanding and task progression. We have compared this face-to-face setting with a setting where the robot employs a random gaze behaviour, as well as a voice-only setting where the robot is hidden behind a paper board. In addition to this, we have also manipulated turn-taking cues such as completeness and filled pauses in the robot's speech. By analysing the participants' subjective rating, task completion, verbal responses, gaze behaviour, and drawing activity, we show that the users indeed benefit from the robot's gaze when talking about landmarks, and that the robot's verbal and gaze behaviour has a strong effect on the users' turn-taking behaviour. We also present an analysis of the users' gaze and lexical and prosodic realisation of feedback after the robot instructions, and show that these cues reveal whether the user has yet executed the previous instruction, as well as the user's level of uncertainty.

Nyckelord
Turn-taking, Feedback, Joint attention, Prosody, Gaze, Uncertainty
Nationell ämneskategori
Annan data- och informationsvetenskap
Identifikatorer
urn:nbn:se:kth:diva-154366 (URN)10.1016/j.specom.2014.05.005 (DOI)000341901700005 ()2-s2.0-84903625192 (Scopus ID)
Forskningsfinansiär
Vetenskapsrådet, 2011-6237 2011-6152EU, FP7, Sjunde ramprogrammet, 288667
Anmärkning

QC 20141021

Tillgänglig från: 2014-10-21 Skapad: 2014-10-20 Senast uppdaterad: 2018-01-11Bibliografiskt granskad
Skantze, G., Oertel, C. & Hjalmarsson, A. (2014). User Feedback in Human-Robot Dialogue: Task Progression and Uncertainty. In: Proceedings of the HRI Workshop on Timing in Human-Robot Interaction: . Paper presented at the HRI Workshop on Timing in Human-Robot Interaction, Bielefeld, Germany, March 3-6, 2014. Bielefeld, Germany
Öppna denna publikation i ny flik eller fönster >>User Feedback in Human-Robot Dialogue: Task Progression and Uncertainty
2014 (Engelska)Ingår i: Proceedings of the HRI Workshop on Timing in Human-Robot Interaction, Bielefeld, Germany, 2014Konferensbidrag, Publicerat paper (Refereegranskat)
Ort, förlag, år, upplaga, sidor
Bielefeld, Germany: , 2014
Nationell ämneskategori
Datavetenskap (datalogi) Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
urn:nbn:se:kth:diva-158183 (URN)
Konferens
the HRI Workshop on Timing in Human-Robot Interaction, Bielefeld, Germany, March 3-6, 2014
Anmärkning

QC 20150223

Tillgänglig från: 2014-12-30 Skapad: 2014-12-30 Senast uppdaterad: 2018-01-11Bibliografiskt granskad
Heldner, M., Hjalmarsson, A. & Edlund, J. (2013). Backchannel relevance spaces. In: Eva Liina / Lippus, Pärtel (Ed.), Prosody: Proceedings of the XIth Conference, Tartu 2012. Paper presented at Prosody: Proceedings of XIth Conference (pp. 137-146). Peter Lang Publishing Group
Öppna denna publikation i ny flik eller fönster >>Backchannel relevance spaces
2013 (Engelska)Ingår i: Prosody: Proceedings of the XIth Conference, Tartu 2012 / [ed] Eva Liina / Lippus, Pärtel, Peter Lang Publishing Group, 2013, s. 137-146Konferensbidrag, Publicerat paper (Refereegranskat)
Ort, förlag, år, upplaga, sidor
Peter Lang Publishing Group, 2013
Nationell ämneskategori
Datavetenskap (datalogi) Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
urn:nbn:se:kth:diva-137398 (URN)978-3631644270 (ISBN)
Konferens
Prosody: Proceedings of XIth Conference
Anmärkning

tmh_import_13_12_13, tmh_id_3870. QC 20140129

Tillgänglig från: 2013-12-13 Skapad: 2013-12-13 Senast uppdaterad: 2018-01-11Bibliografiskt granskad
Skantze, G., Hjalmarsson, A. & Oertel, C. (2013). Exploring the effects of gaze and pauses in situated human-robot interaction. In: 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue: SIGDIAL 2013. Paper presented at 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue SIGdial 2013; Metz, France, 22-24 August, 2013. ACL
Öppna denna publikation i ny flik eller fönster >>Exploring the effects of gaze and pauses in situated human-robot interaction
2013 (Engelska)Ingår i: 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue: SIGDIAL 2013, ACL , 2013Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In this paper, we present a user study where a robot instructs a human on how to draw a route on a map, similar to a Map Task. This setup has allowed us to study user reactions to the robot’s conversational behaviour in order to get a better understanding of how to generate utterances in incremental dialogue systems. We have analysed the participants' subjective rating, task completion, verbal responses, gaze behaviour, drawing activity, and cognitive load. The results show that users utilise the robot’s gaze in order to disambiguate referring expressions and manage the flow of the interaction. Furthermore, we show that the user’s behaviour is affected by how pauses are realised in the robot’s speech.

Ort, förlag, år, upplaga, sidor
ACL, 2013
Nationell ämneskategori
Datavetenskap (datalogi) Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
urn:nbn:se:kth:diva-134932 (URN)2-s2.0-84987866691 (Scopus ID)9781627489874 (ISBN)
Konferens
14th Annual Meeting of the Special Interest Group on Discourse and Dialogue SIGdial 2013; Metz, France, 22-24 August, 2013
Anmärkning

QC 20140610

Tillgänglig från: 2013-12-02 Skapad: 2013-12-02 Senast uppdaterad: 2018-01-11Bibliografiskt granskad
Strömbergsson, S., Hjalmarsson, A., Edlund, J. & House, D. (2013). Timing responses to questions in dialogue. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2013: . Paper presented at 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, Lyon, France, 25 August 2013 through 29 August 2013 (pp. 2584-2588). Lyon, France: International Speech and Communication Association
Öppna denna publikation i ny flik eller fönster >>Timing responses to questions in dialogue
2013 (Engelska)Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, Lyon, France: International Speech and Communication Association , 2013, s. 2584-2588Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Questions and answers play an important role in spoken dialogue systems as well as in human-human interaction. A critical concern when responding to a question is the timing of the response. While human response times depend on a wide set of features, dialogue systems generally respond as soon as they can, that is, when the end of the question has been detected and the response is ready to be deployed. This paper presents an analysis of how different semantic and pragmatic features affect the response times to questions in two different data sets of spontaneous human-human dialogues: the Swedish Spontal Corpus and the US English Switchboard corpus. Our analysis shows that contextual features such as question type, response type, and conversation topic influence human response times. Based on these results, we propose that more sophisticated response timing can be achieved in spoken dialogue systems by using these features to automatically and deliberately target system response timing.

Ort, förlag, år, upplaga, sidor
Lyon, France: International Speech and Communication Association, 2013
Nyckelord
Question intonation, Response times, Speech prosody, Spontaneous speech
Nationell ämneskategori
Datavetenskap (datalogi) Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
urn:nbn:se:kth:diva-137391 (URN)2-s2.0-84906241025 (Scopus ID)
Konferens
14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, Lyon, France, 25 August 2013 through 29 August 2013
Anmärkning

QC 20150211

Tillgänglig från: 2013-12-13 Skapad: 2013-12-13 Senast uppdaterad: 2018-01-11Bibliografiskt granskad
Skantze, G. & Hjalmarsson, A. (2013). Towards incremental speech generation in conversational systems. Computer speech & language (Print), 27(1), 243-262
Öppna denna publikation i ny flik eller fönster >>Towards incremental speech generation in conversational systems
2013 (Engelska)Ingår i: Computer speech & language (Print), ISSN 0885-2308, E-ISSN 1095-8363, Vol. 27, nr 1, s. 243-262Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

This paper presents a model of incremental speech generation in practical conversational systems. The model allows a conversational system to incrementally interpret spoken input, while simultaneously planning, realising and self-monitoring the system response. If these processes are time consuming and result in a response delay, the system can automatically produce hesitations to retain the floor. While speaking, the system utilises hidden and overt self-corrections to accommodate revisions in the system. The model has been implemented in a general dialogue system framework. Using this framework, we have implemented a conversational game application. A Wizard-of-Oz experiment is presented, where the automatic speech recognizer is replaced by a Wizard who transcribes the spoken input. In this setting, the incremental model allows the system to start speaking while the user's utterance is being transcribed. In comparison to a non-incremental version of the same system, the incremental version has a shorter response time and is perceived as more efficient by the users.

Nyckelord
Conversational systems, Incremental processing, Speech generation, Wizard-of-Oz
Nationell ämneskategori
Datavetenskap (datalogi) Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
urn:nbn:se:kth:diva-107012 (URN)10.1016/j.csl.2012.05.004 (DOI)000311524800014 ()2-s2.0-84867329282 (Scopus ID)
Forskningsfinansiär
Vetenskapsrådet, 2011-6237 2011-6152ICT - The Next Generation
Anmärkning

QC 20130109

Tillgänglig från: 2012-12-05 Skapad: 2012-12-05 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
Skantze, G., Oertel, C. & Hjalmarsson, A. (2013). User feedback in human-robot interaction: Prosody, gaze and timing. In: Proceedings of Interspeech 2013: . Paper presented at 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013; Lyon; France; 25 August 2013 through 29 August 2013 (pp. 1901-1905).
Öppna denna publikation i ny flik eller fönster >>User feedback in human-robot interaction: Prosody, gaze and timing
2013 (Engelska)Ingår i: Proceedings of Interspeech 2013, 2013, s. 1901-1905Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This paper investigates forms and functions of user feedback in a map task dialogue between a human and a robot, where the robot is the instruction-giver and the human is the instruction- follower. First, we investigate how user acknowledgements in task-oriented dialogue signal whether an activity is about to be initiated or has been completed. The parameters analysed include the users' lexical and prosodic realisation as well as gaze direction and response timing. Second, we investigate the relation between these parameters and the perception of uncertainty.

Serie
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X
Nyckelord
Feedback, Prosody, Gaze, Human-robot inter-action
Nationell ämneskategori
Datavetenskap (datalogi) Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
urn:nbn:se:kth:diva-134933 (URN)2-s2.0-84906244754 (Scopus ID)
Konferens
14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013; Lyon; France; 25 August 2013 through 29 August 2013
Anmärkning

QC 20140610

Tillgänglig från: 2013-12-02 Skapad: 2013-12-02 Senast uppdaterad: 2018-01-11Bibliografiskt granskad
Edlund, J., Alexanderson, S., Beskow, J., Gustavsson, L., Heldner, M., Hjalmarsson, A., . . . Marklund, E. (2012). 3rd party observer gaze as a continuous measure of dialogue flow. In: LREC 2012 - Eighth International Conference On Language Resources And Evaluation: . Paper presented at 8th International Conference on Language Resources and Evaluation (LREC),Istanbul, Turkey, May 21-27, 2012 (pp. 1354-1358). Istanbul, Turkey: European Language Resources Association
Öppna denna publikation i ny flik eller fönster >>3rd party observer gaze as a continuous measure of dialogue flow
Visa övriga...
2012 (Engelska)Ingår i: LREC 2012 - Eighth International Conference On Language Resources And Evaluation, Istanbul, Turkey: European Language Resources Association, 2012, s. 1354-1358Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

We present an attempt at using 3rd party observer gaze to get a measure of how appropriate each segment in a dialogue is for a speaker change. The method is a step away from the current dependency of speaker turns or talkspurts towards a more general view of speaker changes. We show that 3rd party observers do indeed largely look at the same thing (the speaker), and how this can be captured and utilized to provide insights into human communication. In addition, the results also suggest that there might be differences in the distribution of 3rd party observer gaze depending on how information-rich an utterance is.

Ort, förlag, år, upplaga, sidor
Istanbul, Turkey: European Language Resources Association, 2012
Nationell ämneskategori
Datavetenskap (datalogi) Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
urn:nbn:se:kth:diva-109366 (URN)000323927701073 ()978-2-9517408-7-7 (ISBN)
Konferens
8th International Conference on Language Resources and Evaluation (LREC),Istanbul, Turkey, May 21-27, 2012
Anmärkning

QC 20130523

Tillgänglig från: 2013-01-02 Skapad: 2013-01-02 Senast uppdaterad: 2018-01-11Bibliografiskt granskad
Edlund, J., Heldner, M. & Hjalmarsson, A. (2012). 3rd party observer gaze during backchannels. In: Proc. of the Interspeech 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog. Paper presented at the Interspeech 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog. Skamania Lodge, WA, USA
Öppna denna publikation i ny flik eller fönster >>3rd party observer gaze during backchannels
2012 (Engelska)Ingår i: Proc. of the Interspeech 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog, Skamania Lodge, WA, USA, 2012Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This paper describes a study of how the gazes of 3rd party observers of dialogue move when a speaker is taking the turn and producing a back-channel, respectively. The data is collected and basic processing is complete, but the results section for the paper is not yet in place. It will be in time for the workshop, however, and will be presented there, should this paper outline be accepted..

Ort, förlag, år, upplaga, sidor
Skamania Lodge, WA, USA: , 2012
Nyckelord
speech synthesis, unit selection, joint costs
Nationell ämneskategori
Datavetenskap (datalogi) Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
urn:nbn:se:kth:diva-107010 (URN)
Konferens
the Interspeech 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog
Anmärkning

tmh_import_12_12_05, tmh_id_3781, QC 20121217

Tillgänglig från: 2012-12-05 Skapad: 2012-12-05 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0003-3585-8077

Sök vidare i DiVA

Visa alla publikationer