Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 80) Show all publications
Shore, T., Androulakaki, T. & Skantze, G. (2019). KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue. In: LREC 2018 - 11th International Conference on Language Resources and Evaluation: . Paper presented at 11th International Conference on Language Resources and Evaluation, LREC 2018, Phoenix Seagaia Conference Center Miyazaki, Japan, 7 May 2018 through 12 May 2018 (pp. 768-775). Tokyo
Open this publication in new window or tab >>KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue
2019 (English)In: LREC 2018 - 11th International Conference on Language Resources and Evaluation, Tokyo, 2019, p. 768-775Conference paper, Published paper (Refereed)
Abstract [en]

There is a growing body of research focused on task-oriented instructor-manipulator dialogue, whereby one dialogue participant initiates a reference to an entity in a common environment while the other participant must resolve this reference in order to manipulate said entity. Many of these works are based on disparate if nevertheless similar datasets. This paper described an English corpus of referring expressions in relatively free, unrestricted dialogue with physical features generated in a simulation, which facilitate analysis of dialogic linguistic phenomena regarding alignment in the formation of referring expressions known as conceptual pacts.

Place, publisher, year, edition, pages
Tokyo: , 2019
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-232957 (URN)2-s2.0-85059895102 (Scopus ID)9791095546009 (ISBN)
Conference
11th International Conference on Language Resources and Evaluation, LREC 2018, Phoenix Seagaia Conference Center Miyazaki, Japan, 7 May 2018 through 12 May 2018
Note

QC 20180809

Available from: 2018-08-06 Created: 2018-08-06 Last updated: 2019-02-18Bibliographically approved
Kontogiorgos, D., Skantze, G., Abelho Pereira, A. T. & Gustafson, J. (2019). The Effects of Embodiment and Social Eye-Gaze in Conversational Agents. In: Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci): . Paper presented at 41st Annual Meeting of the Cognitive Science (CogSci), Montreal July 24th – Saturday July 27th, 2019.
Open this publication in new window or tab >>The Effects of Embodiment and Social Eye-Gaze in Conversational Agents
2019 (English)In: Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci), 2019Conference paper, Published paper (Refereed)
Abstract [en]

The adoption of conversational agents is growing at a rapid pace. Agents however, are not optimised to simulate key social aspects of situated human conversational environments. Humans are intellectually biased towards social activity when facing more anthropomorphic agents or when presented with subtle social cues. In this work, we explore the effects of simulating anthropomorphism and social eye-gaze in three conversational agents. We tested whether subjects’ visual attention would be similar to agents in different forms of embodiment and social eye-gaze. In a within-subject situated interaction study (N=30), we asked subjects to engage in task-oriented dialogue with a smart speaker and two variations of a social robot. We observed shifting of interactive behaviour by human users, as shown in differences in behavioural and objective measures. With a trade-off in task performance, social facilitation is higher with more anthropomorphic social agents when performing the same task.

National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-255126 (URN)
Conference
41st Annual Meeting of the Cognitive Science (CogSci), Montreal July 24th – Saturday July 27th, 2019
Note

QC 20190722

Available from: 2019-07-21 Created: 2019-07-21 Last updated: 2019-07-22Bibliographically approved
Kontogiorgos, D., Avramova, V., Alexanderson, S., Jonell, P., Oertel, C., Beskow, J., . . . Gustafson, J. (2018). A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018): . Paper presented at International Conference on Language Resources and Evaluation (LREC 2018) (pp. 119-127). Paris
Open this publication in new window or tab >>A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction
Show others...
2018 (English)In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris, 2018, p. 119-127Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we present a corpus of multiparty situated interaction where participants collaborated on moving virtual objects on a large touch screen. A moderator facilitated the discussion and directed the interaction. The corpus contains recordings of a variety of multimodal data, in that we captured speech, eye gaze and gesture data using a multisensory setup (wearable eye trackers, motion capture and audio/video). Furthermore, in the description of the multimodal corpus, we investigate four different types of social gaze: referential gaze, joint attention, mutual gaze and gaze aversion by both perspectives of a speaker and a listener. We annotated the groups’ object references during object manipulation tasks and analysed the group’s proportional referential eye-gaze with regards to the referent object. When investigating the distributions of gaze during and before referring expressions we could corroborate the differences in time between speakers’ and listeners’ eye gaze found in earlier studies. This corpus is of particular interest to researchers who are interested in social eye-gaze patterns in turn-taking and referring language in situated multi-party interaction.

Place, publisher, year, edition, pages
Paris: , 2018
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-230238 (URN)2-s2.0-85059891166 (Scopus ID)979-10-95546-00-9 (ISBN)
Conference
International Conference on Language Resources and Evaluation (LREC 2018)
Note

QC 20180614

Available from: 2018-06-13 Created: 2018-06-13 Last updated: 2019-02-19Bibliographically approved
Li, C., Androulakaki, T., Gao, A. Y., Yang, F., Saikia, H., Peters, C. & Skantze, G. (2018). Effects of Posture and Embodiment on Social Distance in Human-Agent Interaction in Mixed Reality. In: Proceedings of the 18th International Conference on Intelligent Virtual Agents: . Paper presented at 18th International Conference on Intelligent Virtual Agents (pp. 191-196). ACM Digital Library
Open this publication in new window or tab >>Effects of Posture and Embodiment on Social Distance in Human-Agent Interaction in Mixed Reality
Show others...
2018 (English)In: Proceedings of the 18th International Conference on Intelligent Virtual Agents, ACM Digital Library, 2018, p. 191-196Conference paper, Published paper (Refereed)
Abstract [en]

Mixed reality offers new potentials for social interaction experiences with virtual agents. In addition, it can be used to experiment with the design of physical robots. However, while previous studies have investigated comfortable social distances between humans and artificial agents in real and virtual environments, there is little data with regards to mixed reality environments. In this paper, we conducted an experiment in which participants were asked to walk up to an agent to ask a question, in order to investigate the social distances maintained, as well as the subject's experience of the interaction. We manipulated both the embodiment of the agent (robot vs. human and virtual vs. physical) as well as closed vs. open posture of the agent. The virtual agent was displayed using a mixed reality headset. Our experiment involved 35 participants in a within-subject design. We show that, in the context of social interactions, mixed reality fares well against physical environments, and robots fare well against humans, barring a few technical challenges.

Place, publisher, year, edition, pages
ACM Digital Library, 2018
National Category
Language Technology (Computational Linguistics) Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-241288 (URN)10.1145/3267851.3267870 (DOI)2-s2.0-85058440240 (Scopus ID)
Conference
18th International Conference on Intelligent Virtual Agents
Note

QC 20190122

Available from: 2019-01-18 Created: 2019-01-18 Last updated: 2019-04-09Bibliographically approved
Peters, C., Li, C., Yang, F., Avramova, V. & Skantze, G. (2018). Investigating Social Distances between Humans, Virtual Humans and Virtual Robots in Mixed Reality. In: Proceedings of 17th International Conference on Autonomous Agents and MultiAgent Systems: . Paper presented at he 17th International Conference on Autonomous Agents and MultiAgent Systems Stockholm, Sweden — July 10 - 15, 2018 (pp. 2247-2249).
Open this publication in new window or tab >>Investigating Social Distances between Humans, Virtual Humans and Virtual Robots in Mixed Reality
Show others...
2018 (English)In: Proceedings of 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018, p. 2247-2249Conference paper, Published paper (Refereed)
Abstract [en]

Mixed reality environments offer new potentials for the design of compelling social interaction experiences with virtual characters. In this paper, we summarise initial experiments we are conducting in which we measure comfortable social distances between humans, virtual humans and virtual robots in mixed reality environments. We consider a scenario in which participants walk within a comfortable distance of a virtual character that has its appearance varied between a male and female human, and a standard- and human-height virtual Pepper robot. Our studies in mixed reality thus far indicate that humans adopt social zones with artificial agents that are similar in manner to human-human social interactions and interactions in virtual reality.

National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-241285 (URN)2-s2.0-85054717128 (Scopus ID)
Conference
he 17th International Conference on Autonomous Agents and MultiAgent Systems Stockholm, Sweden — July 10 - 15, 2018
Note

QC 20190214

Available from: 2019-01-18 Created: 2019-01-18 Last updated: 2019-03-18Bibliographically approved
Roddy, M., Skantze, G. & Harte, N. (2018). Investigating speech features for continuous turn-taking prediction using LSTMs. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH: . Paper presented at 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018; Hyderabad International Convention Centre (HICC)Hyderabad; India; 2 September 2018 through 6 September 2018 (pp. 586-590). International Speech Communication Association
Open this publication in new window or tab >>Investigating speech features for continuous turn-taking prediction using LSTMs
2018 (English)In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association, 2018, p. 586-590Conference paper, Published paper (Refereed)
Abstract [en]

For spoken dialog systems to conduct fluid conversational interactions with users, the systems must be sensitive to turn-taking cues produced by a user. Models should be designed so that effective decisions can be made as to when it is appropriate, or not, for the system to speak. Traditional end-of-turn models, where decisions are made at utterance end-points, are limited in their ability to model fast turn-switches and overlap. A more flexible approach is to model turn-taking in a continuous manner using RNNs, where the system predicts speech probability scores for discrete frames within a future window. The continuous predictions represent generalized turn-taking behaviors observed in the training data and can be applied to make decisions that are not just limited to end-of-turn detection. In this paper, we investigate optimal speech-related feature sets for making predictions at pauses and overlaps in conversation. We find that while traditional acoustic features perform well, part-of-speech features generally perform worse than word features. We show that our current models outperform previously reported baselines.

Place, publisher, year, edition, pages
International Speech Communication Association, 2018
Series
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X ; 2018
Keywords
Spoken dialog systems, Turn-taking
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-246548 (URN)10.21437/Interspeech.2018-2124 (DOI)000465363900124 ()2-s2.0-85054989715 (Scopus ID)
Conference
19th Annual Conference of the International Speech Communication, INTERSPEECH 2018; Hyderabad International Convention Centre (HICC)Hyderabad; India; 2 September 2018 through 6 September 2018
Note

QC 20190320

Available from: 2019-03-20 Created: 2019-03-20 Last updated: 2019-10-18Bibliographically approved
Roddy, M., Skantze, G. & Harte, N. (2018). Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs. In: ICMI 2018 - Proceedings of the 2018 International Conference on Multimodal Interaction: . Paper presented at 20th ACM International Conference on Multimodal Interaction, ICMI 2018, University of Colorado BoulderBoulder, United States, 16 October 2018 through 20 October 2018) (pp. 186-190).
Open this publication in new window or tab >>Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs
2018 (English)In: ICMI 2018 - Proceedings of the 2018 International Conference on Multimodal Interaction, 2018, p. 186-190Conference paper, Published paper (Refereed)
Abstract [en]

In human conversational interactions, turn-taking exchanges can be coordinated using cues from multiple modalities. To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues from separate modalities into turn-taking models. We propose that there is an appropriate temporal granularity at which modalities should be modeled. We design a multiscale RNN architecture to model modalities at separate timescales in a continuous manner. Our results show that modeling linguistic and acoustic features at separate temporal rates can be beneficial for turn-taking modeling. We also show that our approach can be used to incorporate gaze features into turn-taking models.

National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-241286 (URN)10.1145/3242969.3242997 (DOI)000457913100027 ()2-s2.0-85056659938 (Scopus ID)9781450356923 (ISBN)
Conference
20th ACM International Conference on Multimodal Interaction, ICMI 2018, University of Colorado BoulderBoulder, United States, 16 October 2018 through 20 October 2018)
Note

QC 20190215

Available from: 2019-01-18 Created: 2019-01-18 Last updated: 2019-02-22Bibliographically approved
Kontogiorgos, D., Sibirtseva, E., Pereira, A., Skantze, G. & Gustafson, J. (2018). Multimodal reference resolution in collaborative assembly tasks. In: Multimodal reference resolution in collaborative assembly tasks: . Paper presented at Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction. ACM Digital Library
Open this publication in new window or tab >>Multimodal reference resolution in collaborative assembly tasks
Show others...
2018 (English)In: Multimodal reference resolution in collaborative assembly tasks, ACM Digital Library, 2018Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
ACM Digital Library, 2018
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-235547 (URN)10.1145/3279972.3279976 (DOI)000458138700007 ()2-s2.0-85058174973 (Scopus ID)
Conference
Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction
Note

QC 20181009

Available from: 2018-09-29 Created: 2018-09-29 Last updated: 2019-03-20Bibliographically approved
Shore, T. & Skantze, G. (2018). Using Lexical Alignment and Referring Ability to Address Data Sparsity in Situated Dialog Reference Resolution. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP): . Paper presented at 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP, October 31–November 4 Brussels, Belgium (pp. 2288-2297).
Open this publication in new window or tab >>Using Lexical Alignment and Referring Ability to Address Data Sparsity in Situated Dialog Reference Resolution
2018 (English)In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018, p. 2288-2297Conference paper, Published paper (Refereed)
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-241287 (URN)
Conference
2018 Conference on Empirical Methods in Natural Language Processing (EMNLP, October 31–November 4 Brussels, Belgium
Note

QC 20190212

Available from: 2019-01-18 Created: 2019-01-18 Last updated: 2019-02-13Bibliographically approved
Johansson, M., Hori, T., Skantze, G., Hothker, A. & Gustafson, J. (2016). Making Turn-Taking Decisions for an Active Listening Robot for Memory Training. In: SOCIAL ROBOTICS, (ICSR 2016): . Paper presented at 8th International Conference on Social Robotics (ICSR), NOV 01-03, 2016, Kansas City, MO (pp. 940-949). Springer
Open this publication in new window or tab >>Making Turn-Taking Decisions for an Active Listening Robot for Memory Training
Show others...
2016 (English)In: SOCIAL ROBOTICS, (ICSR 2016), Springer, 2016, p. 940-949Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we present a dialogue system and response model that allows a robot to act as an active listener, encouraging users to tell the robot about their travel memories. The response model makes a combined decision about when to respond and what type of response to give, in order to elicit more elaborate descriptions from the user and avoid non-sequitur responses. The model was trained on human-robot dialogue data collected in a Wizard-of-Oz setting, and evaluated in a fully autonomous version of the same dialogue system. Compared to a baseline system, users perceived the dialogue system with the trained model to be a significantly better listener. The trained model also resulted in dialogues with significantly fewer mistakes, a larger proportion of user speech and fewer interruptions.

Place, publisher, year, edition, pages
Springer, 2016
Series
Lecture Notes in Artificial Intelligence, ISSN 0302-9743 ; 9979
Keywords
Turn-taking, Active listening, Social robotics, Memory training
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-200064 (URN)10.1007/978-3-319-47437-3_92 (DOI)000389816500092 ()2-s2.0-84992499074 (Scopus ID)978-3-319-47437-3 (ISBN)978-3-319-47436-6 (ISBN)
Conference
8th International Conference on Social Robotics (ICSR), NOV 01-03, 2016, Kansas City, MO
Note

QC 20170125

Available from: 2017-01-25 Created: 2017-01-20 Last updated: 2018-01-13Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-8579-1790

Search in DiVA

Show all publications