Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 83) Show all publications
Jonell, P., Lopes, J., Per, F., Wennberg, U., Doğan, F. I. & Skantze, G. (2019). Crowdsourcing a self-evolving dialog graph. In: CUI '19: Proceedings of the 1st International Conference on Conversational User Interfaces: . Paper presented at 1st International Conference on Conversational User Interfaces, CUI 2019; Dublin; Ireland; 22 August 2019 through 23 August 2019. Association for Computing Machinery (ACM), Article ID 14.
Open this publication in new window or tab >>Crowdsourcing a self-evolving dialog graph
Show others...
2019 (English)In: CUI '19: Proceedings of the 1st International Conference on Conversational User Interfaces, Association for Computing Machinery (ACM), 2019, article id 14Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we present a crowdsourcing-based approach for collecting dialog data for a social chat dialog system, which gradually builds a dialog graph from actual user responses and crowd-sourced system answers, conditioned by a given persona and other instructions. This approach was tested during the second instalment of the Amazon Alexa Prize 2018 (AP2018), both for the data collection and to feed a simple dialog system which would use the graph to provide answers. As users interacted with the system, a graph which maintained the structure of the dialogs was built, identifying parts where more coverage was needed. In an ofine evaluation, we have compared the corpus collected during the competition with other potential corpora for training chatbots, including movie subtitles, online chat forums and conversational data. The results show that the proposed methodology creates data that is more representative of actual user utterances, and leads to more coherent and engaging answers from the agent. An implementation of the proposed method is available as open-source code.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2019
Series
ACM International Conference Proceeding Series
Keywords
Crowdsourcing, Datasets, Dialog systems, Human-computer interaction
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-266061 (URN)10.1145/3342775.3342790 (DOI)2-s2.0-85075882531 (Scopus ID)9781450371872 (ISBN)
Conference
1st International Conference on Conversational User Interfaces, CUI 2019; Dublin; Ireland; 22 August 2019 through 23 August 2019
Note

QC 20200114

Available from: 2020-01-14 Created: 2020-01-14 Last updated: 2020-01-14Bibliographically approved
Ibrahim, O., Skantze, G., Stoll, S. & Dellwo, V. (2019). Fundamental frequency accommodation in multi-party human-robot game interactions: The effect of winning or losing. In: Proceedings Interspeech 2019: . Paper presented at Interspeech 2019, 15-19 September 2019, Graz, Austria (pp. 3980-3984). International Speech Communication Association
Open this publication in new window or tab >>Fundamental frequency accommodation in multi-party human-robot game interactions: The effect of winning or losing
2019 (English)In: Proceedings Interspeech 2019, International Speech Communication Association, 2019, p. 3980-3984Conference paper, Published paper (Refereed)
Abstract [en]

In human-human interactions, the situational context plays a large role in the degree of speakers’ accommodation. In this paper, we investigate whether the degree of accommodation in a human-robot computer game is affected by (a) the duration of the interaction and (b) the success of the players in the game. 30 teams of two players played two card games with a conversational robot in which they had to find a correct order of five cards. After game 1, the players received the result of the game on a success scale from 1 (lowest success) to 5 (highest). Speakers’ fo accommodation was measured as the Euclidean distance between the human speakers and each human and the robot. Results revealed that (a) the duration of the game had no influence on the degree of fo accommodation and (b) the result of Game 1 correlated with the degree of fo accommodation in Game 2 (higher success equals lower Euclidean distance). We argue that game success is most likely considered as a sign of the success of players’ cooperation during the discussion, which leads to a higher accommodation behavior in speech.

Place, publisher, year, edition, pages
International Speech Communication Association, 2019
Series
Interspeech, ISSN 1990-9772
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-267215 (URN)10.21437/Interspeech.2019-2496 (DOI)2-s2.0-85074731228 (Scopus ID)
Conference
Interspeech 2019, 15-19 September 2019, Graz, Austria
Note

QC 20200214

Available from: 2020-02-04 Created: 2020-02-04 Last updated: 2020-03-23Bibliographically approved
Shore, T., Androulakaki, T. & Skantze, G. (2019). KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue. In: LREC 2018 - 11th International Conference on Language Resources and Evaluation: . Paper presented at 11th International Conference on Language Resources and Evaluation, LREC 2018, Phoenix Seagaia Conference Center Miyazaki, Japan, 7 May 2018 through 12 May 2018 (pp. 768-775). Tokyo
Open this publication in new window or tab >>KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue
2019 (English)In: LREC 2018 - 11th International Conference on Language Resources and Evaluation, Tokyo, 2019, p. 768-775Conference paper, Published paper (Refereed)
Abstract [en]

There is a growing body of research focused on task-oriented instructor-manipulator dialogue, whereby one dialogue participant initiates a reference to an entity in a common environment while the other participant must resolve this reference in order to manipulate said entity. Many of these works are based on disparate if nevertheless similar datasets. This paper described an English corpus of referring expressions in relatively free, unrestricted dialogue with physical features generated in a simulation, which facilitate analysis of dialogic linguistic phenomena regarding alignment in the formation of referring expressions known as conceptual pacts.

Place, publisher, year, edition, pages
Tokyo: , 2019
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-232957 (URN)2-s2.0-85059895102 (Scopus ID)9791095546009 (ISBN)
Conference
11th International Conference on Language Resources and Evaluation, LREC 2018, Phoenix Seagaia Conference Center Miyazaki, Japan, 7 May 2018 through 12 May 2018
Note

QC 20180809

Available from: 2018-08-06 Created: 2018-08-06 Last updated: 2019-02-18Bibliographically approved
Axelsson, N. & Skantze, G. (2019). Modelling Adaptive Presentations in Human-Robot Interaction using Behaviour Trees. In: Satoshi Nakamura (Ed.), 20th Annual Meeting of the Special Interest Group on Discourse and Dialogue: Proceedings of the Conference. Paper presented at SIGDIAL 2019 (pp. 345-352). Stroudsburg, PA
Open this publication in new window or tab >>Modelling Adaptive Presentations in Human-Robot Interaction using Behaviour Trees
2019 (English)In: 20th Annual Meeting of the Special Interest Group on Discourse and Dialogue: Proceedings of the Conference / [ed] Satoshi Nakamura, Stroudsburg, PA, 2019, p. 345-352Conference paper, Published paper (Refereed)
Abstract [en]

In dialogue, speakers continuously adapt their speech to accommodate the listener, based on the feedback they receive. In this paper, we explore the modelling of such behaviours in the context of a robot presenting a painting. A Behaviour Tree is used to organise the behaviour on different levels, and allow the robot to adapt its behaviour in real-time; the tree organises engagement, joint attention, turn-taking, feedback and incremental speech processing. An initial implementation of the model is presented, and the system is evaluated in a user study, where the adaptive robot presenter is compared to a non-adaptive version. The adaptive version is found to be more engaging by the users, although no effects are found on the retention of the presented material.

Place, publisher, year, edition, pages
Stroudsburg, PA: , 2019
Keywords
human-robot interaction, presentation, acceptance, understanding, hearing, attention, robot, Furhat, presenter, adaptive, non-adaptive, retention, engagement, interaktion, presentation, acceptans, förståelse, förstånd, hörsel, uppmärksamhet, robot, Furhat, presentatör, adaptiv, ickeadaptiv, minne, ihågkomst, engagemang
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Speech and Music Communication; Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-267218 (URN)978-1-950737-61-1 (ISBN)
Conference
SIGDIAL 2019
Projects
Co-adaptive Human-Robot Interactive Systems
Funder
Swedish Foundation for Strategic Research , RIT15-0133
Note

QC 20200205

Available from: 2020-02-04 Created: 2020-02-04 Last updated: 2020-02-05Bibliographically approved
Kontogiorgos, D., Skantze, G., Abelho Pereira, A. T. & Gustafson, J. (2019). The Effects of Embodiment and Social Eye-Gaze in Conversational Agents. In: Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci): . Paper presented at 41st Annual Meeting of the Cognitive Science (CogSci), Montreal July 24th – Saturday July 27th, 2019.
Open this publication in new window or tab >>The Effects of Embodiment and Social Eye-Gaze in Conversational Agents
2019 (English)In: Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci), 2019Conference paper, Published paper (Refereed)
Abstract [en]

The adoption of conversational agents is growing at a rapid pace. Agents however, are not optimised to simulate key social aspects of situated human conversational environments. Humans are intellectually biased towards social activity when facing more anthropomorphic agents or when presented with subtle social cues. In this work, we explore the effects of simulating anthropomorphism and social eye-gaze in three conversational agents. We tested whether subjects’ visual attention would be similar to agents in different forms of embodiment and social eye-gaze. In a within-subject situated interaction study (N=30), we asked subjects to engage in task-oriented dialogue with a smart speaker and two variations of a social robot. We observed shifting of interactive behaviour by human users, as shown in differences in behavioural and objective measures. With a trade-off in task performance, social facilitation is higher with more anthropomorphic social agents when performing the same task.

National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-255126 (URN)
Conference
41st Annual Meeting of the Cognitive Science (CogSci), Montreal July 24th – Saturday July 27th, 2019
Note

QC 20190722

Available from: 2019-07-21 Created: 2019-07-21 Last updated: 2019-07-22Bibliographically approved
Kontogiorgos, D., Avramova, V., Alexanderson, S., Jonell, P., Oertel, C., Beskow, J., . . . Gustafson, J. (2018). A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018): . Paper presented at International Conference on Language Resources and Evaluation (LREC 2018) (pp. 119-127). Paris
Open this publication in new window or tab >>A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction
Show others...
2018 (English)In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris, 2018, p. 119-127Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we present a corpus of multiparty situated interaction where participants collaborated on moving virtual objects on a large touch screen. A moderator facilitated the discussion and directed the interaction. The corpus contains recordings of a variety of multimodal data, in that we captured speech, eye gaze and gesture data using a multisensory setup (wearable eye trackers, motion capture and audio/video). Furthermore, in the description of the multimodal corpus, we investigate four different types of social gaze: referential gaze, joint attention, mutual gaze and gaze aversion by both perspectives of a speaker and a listener. We annotated the groups’ object references during object manipulation tasks and analysed the group’s proportional referential eye-gaze with regards to the referent object. When investigating the distributions of gaze during and before referring expressions we could corroborate the differences in time between speakers’ and listeners’ eye gaze found in earlier studies. This corpus is of particular interest to researchers who are interested in social eye-gaze patterns in turn-taking and referring language in situated multi-party interaction.

Place, publisher, year, edition, pages
Paris: , 2018
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-230238 (URN)2-s2.0-85059891166 (Scopus ID)979-10-95546-00-9 (ISBN)
Conference
International Conference on Language Resources and Evaluation (LREC 2018)
Note

QC 20180614

Available from: 2018-06-13 Created: 2018-06-13 Last updated: 2019-02-19Bibliographically approved
Li, C., Androulakaki, T., Gao, A. Y., Yang, F., Saikia, H., Peters, C. & Skantze, G. (2018). Effects of Posture and Embodiment on Social Distance in Human-Agent Interaction in Mixed Reality. In: Proceedings of the 18th International Conference on Intelligent Virtual Agents: . Paper presented at 18th International Conference on Intelligent Virtual Agents (pp. 191-196). ACM Digital Library
Open this publication in new window or tab >>Effects of Posture and Embodiment on Social Distance in Human-Agent Interaction in Mixed Reality
Show others...
2018 (English)In: Proceedings of the 18th International Conference on Intelligent Virtual Agents, ACM Digital Library, 2018, p. 191-196Conference paper, Published paper (Refereed)
Abstract [en]

Mixed reality offers new potentials for social interaction experiences with virtual agents. In addition, it can be used to experiment with the design of physical robots. However, while previous studies have investigated comfortable social distances between humans and artificial agents in real and virtual environments, there is little data with regards to mixed reality environments. In this paper, we conducted an experiment in which participants were asked to walk up to an agent to ask a question, in order to investigate the social distances maintained, as well as the subject's experience of the interaction. We manipulated both the embodiment of the agent (robot vs. human and virtual vs. physical) as well as closed vs. open posture of the agent. The virtual agent was displayed using a mixed reality headset. Our experiment involved 35 participants in a within-subject design. We show that, in the context of social interactions, mixed reality fares well against physical environments, and robots fare well against humans, barring a few technical challenges.

Place, publisher, year, edition, pages
ACM Digital Library, 2018
National Category
Language Technology (Computational Linguistics) Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-241288 (URN)10.1145/3267851.3267870 (DOI)2-s2.0-85058440240 (Scopus ID)
Conference
18th International Conference on Intelligent Virtual Agents
Note

QC 20190122

Available from: 2019-01-18 Created: 2019-01-18 Last updated: 2019-04-09Bibliographically approved
Peters, C., Li, C., Yang, F., Avramova, V. & Skantze, G. (2018). Investigating Social Distances between Humans, Virtual Humans and Virtual Robots in Mixed Reality. In: Proceedings of 17th International Conference on Autonomous Agents and MultiAgent Systems: . Paper presented at he 17th International Conference on Autonomous Agents and MultiAgent Systems Stockholm, Sweden — July 10 - 15, 2018 (pp. 2247-2249).
Open this publication in new window or tab >>Investigating Social Distances between Humans, Virtual Humans and Virtual Robots in Mixed Reality
Show others...
2018 (English)In: Proceedings of 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018, p. 2247-2249Conference paper, Published paper (Refereed)
Abstract [en]

Mixed reality environments offer new potentials for the design of compelling social interaction experiences with virtual characters. In this paper, we summarise initial experiments we are conducting in which we measure comfortable social distances between humans, virtual humans and virtual robots in mixed reality environments. We consider a scenario in which participants walk within a comfortable distance of a virtual character that has its appearance varied between a male and female human, and a standard- and human-height virtual Pepper robot. Our studies in mixed reality thus far indicate that humans adopt social zones with artificial agents that are similar in manner to human-human social interactions and interactions in virtual reality.

National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-241285 (URN)000468231300383 ()2-s2.0-85054717128 (Scopus ID)
Conference
he 17th International Conference on Autonomous Agents and MultiAgent Systems Stockholm, Sweden — July 10 - 15, 2018
Note

QC 20190214

Available from: 2019-01-18 Created: 2019-01-18 Last updated: 2020-03-11Bibliographically approved
Roddy, M., Skantze, G. & Harte, N. (2018). Investigating speech features for continuous turn-taking prediction using LSTMs. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH: . Paper presented at 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018; Hyderabad International Convention Centre (HICC)Hyderabad; India; 2 September 2018 through 6 September 2018 (pp. 586-590). International Speech Communication Association
Open this publication in new window or tab >>Investigating speech features for continuous turn-taking prediction using LSTMs
2018 (English)In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association, 2018, p. 586-590Conference paper, Published paper (Refereed)
Abstract [en]

For spoken dialog systems to conduct fluid conversational interactions with users, the systems must be sensitive to turn-taking cues produced by a user. Models should be designed so that effective decisions can be made as to when it is appropriate, or not, for the system to speak. Traditional end-of-turn models, where decisions are made at utterance end-points, are limited in their ability to model fast turn-switches and overlap. A more flexible approach is to model turn-taking in a continuous manner using RNNs, where the system predicts speech probability scores for discrete frames within a future window. The continuous predictions represent generalized turn-taking behaviors observed in the training data and can be applied to make decisions that are not just limited to end-of-turn detection. In this paper, we investigate optimal speech-related feature sets for making predictions at pauses and overlaps in conversation. We find that while traditional acoustic features perform well, part-of-speech features generally perform worse than word features. We show that our current models outperform previously reported baselines.

Place, publisher, year, edition, pages
International Speech Communication Association, 2018
Series
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X ; 2018
Keywords
Spoken dialog systems, Turn-taking
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-246548 (URN)10.21437/Interspeech.2018-2124 (DOI)000465363900124 ()2-s2.0-85054989715 (Scopus ID)
Conference
19th Annual Conference of the International Speech Communication, INTERSPEECH 2018; Hyderabad International Convention Centre (HICC)Hyderabad; India; 2 September 2018 through 6 September 2018
Note

QC 20190320

Available from: 2019-03-20 Created: 2019-03-20 Last updated: 2019-10-18Bibliographically approved
Roddy, M., Skantze, G. & Harte, N. (2018). Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs. In: ICMI 2018 - Proceedings of the 2018 International Conference on Multimodal Interaction: . Paper presented at 20th ACM International Conference on Multimodal Interaction, ICMI 2018, University of Colorado BoulderBoulder, United States, 16 October 2018 through 20 October 2018) (pp. 186-190).
Open this publication in new window or tab >>Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs
2018 (English)In: ICMI 2018 - Proceedings of the 2018 International Conference on Multimodal Interaction, 2018, p. 186-190Conference paper, Published paper (Refereed)
Abstract [en]

In human conversational interactions, turn-taking exchanges can be coordinated using cues from multiple modalities. To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues from separate modalities into turn-taking models. We propose that there is an appropriate temporal granularity at which modalities should be modeled. We design a multiscale RNN architecture to model modalities at separate timescales in a continuous manner. Our results show that modeling linguistic and acoustic features at separate temporal rates can be beneficial for turn-taking modeling. We also show that our approach can be used to incorporate gaze features into turn-taking models.

National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-241286 (URN)10.1145/3242969.3242997 (DOI)000457913100027 ()2-s2.0-85056659938 (Scopus ID)9781450356923 (ISBN)
Conference
20th ACM International Conference on Multimodal Interaction, ICMI 2018, University of Colorado BoulderBoulder, United States, 16 October 2018 through 20 October 2018)
Note

QC 20190215

Available from: 2019-01-18 Created: 2019-01-18 Last updated: 2019-02-22Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-8579-1790

Search in DiVA

Show all publications