Endre søk
Link to record
Permanent link

Direct link
BETA
Alternativa namn
Publikasjoner (10 av 107) Visa alla publikasjoner
Skantze, G., Gustafson, J. & Beskow, J. (2019). Multimodal Conversational Interaction with Robots. In: Sharon Oviatt, Björn Schuller, Philip R. Cohen, Daniel Sonntag, Gerasimos Potamianos, Antonio Krüger (Ed.), The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Language Processing, Software, Commercialization, and Emerging Directions. ACM Press
Åpne denne publikasjonen i ny fane eller vindu >>Multimodal Conversational Interaction with Robots
2019 (engelsk)Inngår i: The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Language Processing, Software, Commercialization, and Emerging Directions / [ed] Sharon Oviatt, Björn Schuller, Philip R. Cohen, Daniel Sonntag, Gerasimos Potamianos, Antonio Krüger, ACM Press, 2019Kapittel i bok, del av antologi (Fagfellevurdert)
sted, utgiver, år, opplag, sider
ACM Press, 2019
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-254650 (URN)9781970001723 (ISBN)
Merknad

QC 20190821

Tilgjengelig fra: 2019-07-02 Laget: 2019-07-02 Sist oppdatert: 2019-08-21bibliografisk kontrollert
Kontogiorgos, D., Skantze, G., Abelho Pereira, A. T. & Gustafson, J. (2019). The Effects of Embodiment and Social Eye-Gaze in Conversational Agents. In: Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci): . Paper presented at 41st Annual Meeting of the Cognitive Science (CogSci), Montreal July 24th – Saturday July 27th, 2019.
Åpne denne publikasjonen i ny fane eller vindu >>The Effects of Embodiment and Social Eye-Gaze in Conversational Agents
2019 (engelsk)Inngår i: Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci), 2019Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The adoption of conversational agents is growing at a rapid pace. Agents however, are not optimised to simulate key social aspects of situated human conversational environments. Humans are intellectually biased towards social activity when facing more anthropomorphic agents or when presented with subtle social cues. In this work, we explore the effects of simulating anthropomorphism and social eye-gaze in three conversational agents. We tested whether subjects’ visual attention would be similar to agents in different forms of embodiment and social eye-gaze. In a within-subject situated interaction study (N=30), we asked subjects to engage in task-oriented dialogue with a smart speaker and two variations of a social robot. We observed shifting of interactive behaviour by human users, as shown in differences in behavioural and objective measures. With a trade-off in task performance, social facilitation is higher with more anthropomorphic social agents when performing the same task.

HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-255126 (URN)
Konferanse
41st Annual Meeting of the Cognitive Science (CogSci), Montreal July 24th – Saturday July 27th, 2019
Merknad

QC 20190722

Tilgjengelig fra: 2019-07-21 Laget: 2019-07-21 Sist oppdatert: 2019-07-22bibliografisk kontrollert
Kontogiorgos, D., Abelho Pereira, A. T. & Gustafson, J. (2019). The Trade-off between Interaction Time and Social Facilitation with Collaborative Social Robots. In: The Challenges of Working on Social Robots that Collaborate with People: . Paper presented at The ACM CHI Conference on Human Factors in Computing Systems, May 4-9, Glasgow, UK.
Åpne denne publikasjonen i ny fane eller vindu >>The Trade-off between Interaction Time and Social Facilitation with Collaborative Social Robots
2019 (engelsk)Inngår i: The Challenges of Working on Social Robots that Collaborate with People, 2019Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The adoption of social robots and conversational agents is growing at a rapid pace. These agents, however, are still not optimised to simulate key social aspects of situated human conversational environments. Humans are intellectually biased towards social activity when facing more anthropomorphic agents or when presented with subtle social cues. In this paper, we discuss the effects of simulating anthropomorphism and non-verbal social behaviour in social robots and its implications for human-robot collaborative guided tasks. Our results indicate that it is not always favourable for agents to be anthropomorphised or to communicate with nonverbal behaviour. We found a clear trade-off between interaction time and social facilitation when controlling for anthropomorphism and social behaviour.

Emneord
social robots, smart speakers
HSV kategori
Forskningsprogram
Människa-datorinteraktion
Identifikatorer
urn:nbn:se:kth:diva-251651 (URN)
Konferanse
The ACM CHI Conference on Human Factors in Computing Systems, May 4-9, Glasgow, UK
Tilgjengelig fra: 2019-05-16 Laget: 2019-05-16 Sist oppdatert: 2019-05-17bibliografisk kontrollert
Sibirtseva, E., Kontogiorgos, D., Nykvist, O., Karaoguz, H., Leite, I., Gustafson, J. & Kragic, D. (2018). A Comparison of Visualisation Methods for Disambiguating Verbal Requests in Human-Robot Interaction. In: 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN): . Paper presented at ROMAN 2018.
Åpne denne publikasjonen i ny fane eller vindu >>A Comparison of Visualisation Methods for Disambiguating Verbal Requests in Human-Robot Interaction
Vise andre…
2018 (engelsk)Inngår i: 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2018Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Picking up objects requested by a human user is a common task in human-robot interaction. When multiple objects match the user's verbal description, the robot needs to clarify which object the user is referring to before executing the action. Previous research has focused on perceiving user's multimodal behaviour to complement verbal commands or minimising the number of follow up questions to reduce task time. In this paper, we propose a system for reference disambiguation based on visualisation and compare three methods to disambiguate natural language instructions. In a controlled experiment with a YuMi robot, we investigated realtime augmentations of the workspace in three conditions - head-mounted display, projector, and a monitor as the baseline - using objective measures such as time and accuracy, and subjective measures like engagement, immersion, and display interference. Significant differences were found in accuracy and engagement between the conditions, but no differences were found in task time. Despite the higher error rates in the head-mounted display condition, participants found that modality more engaging than the other two, but overall showed preference for the projector condition over the monitor and head-mounted display conditions.

HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-235548 (URN)10.1109/ROMAN.2018.8525554 (DOI)978-1-5386-7981-4 (ISBN)
Konferanse
ROMAN 2018
Merknad

QC 20181207

Tilgjengelig fra: 2018-09-29 Laget: 2018-09-29 Sist oppdatert: 2018-12-07bibliografisk kontrollert
Kontogiorgos, D., Avramova, V., Alexanderson, S., Jonell, P., Oertel, C., Beskow, J., . . . Gustafson, J. (2018). A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018): . Paper presented at International Conference on Language Resources and Evaluation (LREC 2018) (pp. 119-127). Paris
Åpne denne publikasjonen i ny fane eller vindu >>A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction
Vise andre…
2018 (engelsk)Inngår i: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris, 2018, s. 119-127Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In this paper we present a corpus of multiparty situated interaction where participants collaborated on moving virtual objects on a large touch screen. A moderator facilitated the discussion and directed the interaction. The corpus contains recordings of a variety of multimodal data, in that we captured speech, eye gaze and gesture data using a multisensory setup (wearable eye trackers, motion capture and audio/video). Furthermore, in the description of the multimodal corpus, we investigate four different types of social gaze: referential gaze, joint attention, mutual gaze and gaze aversion by both perspectives of a speaker and a listener. We annotated the groups’ object references during object manipulation tasks and analysed the group’s proportional referential eye-gaze with regards to the referent object. When investigating the distributions of gaze during and before referring expressions we could corroborate the differences in time between speakers’ and listeners’ eye gaze found in earlier studies. This corpus is of particular interest to researchers who are interested in social eye-gaze patterns in turn-taking and referring language in situated multi-party interaction.

sted, utgiver, år, opplag, sider
Paris: , 2018
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-230238 (URN)2-s2.0-85059891166 (Scopus ID)979-10-95546-00-9 (ISBN)
Konferanse
International Conference on Language Resources and Evaluation (LREC 2018)
Merknad

QC 20180614

Tilgjengelig fra: 2018-06-13 Laget: 2018-06-13 Sist oppdatert: 2019-02-19bibliografisk kontrollert
Jonell, P., Oertel, C., Kontogiorgos, D., Beskow, J. & Gustafson, J. (2018). Crowdsourced Multimodal Corpora Collection Tool. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018): . Paper presented at The Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 728-734). Paris
Åpne denne publikasjonen i ny fane eller vindu >>Crowdsourced Multimodal Corpora Collection Tool
Vise andre…
2018 (engelsk)Inngår i: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris, 2018, s. 728-734Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In recent years, more and more multimodal corpora have been created. To our knowledge there is no publicly available tool which allows for acquiring controlled multimodal data of people in a rapid and scalable fashion. We therefore are proposing (1) a novel tool which will enable researchers to rapidly gather large amounts of multimodal data spanning a wide demographic range, and (2) an example of how we used this tool for corpus collection of our "Attentive listener'' multimodal corpus. The code is released under an Apache License 2.0 and available as an open-source repository, which can be found at https://github.com/kth-social-robotics/multimodal-crowdsourcing-tool. This tool will allow researchers to set-up their own multimodal data collection system quickly and create their own multimodal corpora. Finally, this paper provides a discussion about the advantages and disadvantages with a crowd-sourced data collection tool, especially in comparison to a lab recorded corpora.

sted, utgiver, år, opplag, sider
Paris: , 2018
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-230236 (URN)979-10-95546-00-9 (ISBN)
Konferanse
The Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Merknad

QC 20180618

Tilgjengelig fra: 2018-06-13 Laget: 2018-06-13 Sist oppdatert: 2018-11-13bibliografisk kontrollert
Kragic, D., Gustafson, J., Karaoǧuz, H., Jensfelt, P. & Krug, R. (2018). Interactive, collaborative robots: Challenges and opportunities. In: IJCAI International Joint Conference on Artificial Intelligence: . Paper presented at 27th International Joint Conference on Artificial Intelligence, IJCAI 2018; Stockholm; Sweden; 13 July 2018 through 19 July 2018 (pp. 18-25). International Joint Conferences on Artificial Intelligence
Åpne denne publikasjonen i ny fane eller vindu >>Interactive, collaborative robots: Challenges and opportunities
Vise andre…
2018 (engelsk)Inngår i: IJCAI International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence , 2018, s. 18-25Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Robotic technology has transformed manufacturing industry ever since the first industrial robot was put in use in the beginning of the 60s. The challenge of developing flexible solutions where production lines can be quickly re-planned, adapted and structured for new or slightly changed products is still an important open problem. Industrial robots today are still largely preprogrammed for their tasks, not able to detect errors in their own performance or to robustly interact with a complex environment and a human worker. The challenges are even more serious when it comes to various types of service robots. Full robot autonomy, including natural interaction, learning from and with human, safe and flexible performance for challenging tasks in unstructured environments will remain out of reach for the foreseeable future. In the envisioned future factory setups, home and office environments, humans and robots will share the same workspace and perform different object manipulation tasks in a collaborative manner. We discuss some of the major challenges of developing such systems and provide examples of the current state of the art.

sted, utgiver, år, opplag, sider
International Joint Conferences on Artificial Intelligence, 2018
Emneord
Artificial intelligence, Industrial robots, Collaborative robots, Complex environments, Manufacturing industries, Natural interactions, Object manipulation, Office environments, Robotic technologies, Unstructured environments, Human robot interaction
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-247239 (URN)2-s2.0-85055718956 (Scopus ID)9780999241127 (ISBN)
Konferanse
27th International Joint Conference on Artificial Intelligence, IJCAI 2018; Stockholm; Sweden; 13 July 2018 through 19 July 2018
Forskningsfinansiär
Swedish Foundation for Strategic Research Knut and Alice Wallenberg Foundation
Merknad

QC 20190402

Tilgjengelig fra: 2019-04-02 Laget: 2019-04-02 Sist oppdatert: 2019-05-22bibliografisk kontrollert
Kontogiorgos, D., Sibirtseva, E., Pereira, A., Skantze, G. & Gustafson, J. (2018). Multimodal reference resolution in collaborative assembly tasks. In: Multimodal reference resolution in collaborative assembly tasks: . Paper presented at Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction. ACM Digital Library
Åpne denne publikasjonen i ny fane eller vindu >>Multimodal reference resolution in collaborative assembly tasks
Vise andre…
2018 (engelsk)Inngår i: Multimodal reference resolution in collaborative assembly tasks, ACM Digital Library, 2018Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
ACM Digital Library, 2018
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-235547 (URN)10.1145/3279972.3279976 (DOI)000458138700007 ()2-s2.0-85058174973 (Scopus ID)
Konferanse
Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction
Merknad

QC 20181009

Tilgjengelig fra: 2018-09-29 Laget: 2018-09-29 Sist oppdatert: 2019-03-20bibliografisk kontrollert
Malisz, Z., Berthelsen, H., Beskow, J. & Gustafson, J. (2017). Controlling prominence realisation in parametric DNN-based speech synthesis. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017: . Paper presented at 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, Stockholm, Sweden, 20 August 2017 through 24 August 2017 (pp. 1079-1083). International Speech Communication Association, 2017
Åpne denne publikasjonen i ny fane eller vindu >>Controlling prominence realisation in parametric DNN-based speech synthesis
2017 (engelsk)Inngår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, International Speech Communication Association , 2017, Vol. 2017, s. 1079-1083Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This work aims to improve text-To-speech synthesis forWikipedia by advancing and implementing models of prosodic prominence. We propose a new system architecture with explicit prominence modeling and test the first component of the architecture. We automatically extract a phonetic feature related to prominence from the speech signal in the ARCTIC corpus. We then modify the label files and train an experimental TTS system based on the feature using Merlin, a statistical-parametric DNN-based engine. Test sentences with contrastive prominence on the word-level are synthesised and separate listening tests a) evaluating the level of prominence control in generated speech, and b) naturalness, are conducted. Our results show that the prominence feature-enhanced system successfully places prominence on the appropriate words and increases perceived naturalness relative to the baseline.

sted, utgiver, år, opplag, sider
International Speech Communication Association, 2017
Serie
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X
Emneord
Deep neural networks, Prosodic prominence, Speech synthesis
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-222092 (URN)10.21437/Interspeech.2017-1355 (DOI)2-s2.0-85039164235 (Scopus ID)
Konferanse
18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, Stockholm, Sweden, 20 August 2017 through 24 August 2017
Merknad

QC 20180131

Tilgjengelig fra: 2018-01-31 Laget: 2018-01-31 Sist oppdatert: 2018-01-31bibliografisk kontrollert
Szekely, E., Mendelson, J. & Gustafson, J. (2017). Synthesising uncertainty: The interplay of vocal effort and hesitation disfluencies. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH: . Paper presented at 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, Stockholm, Sweden, 20 August 2017 through 24 August 2017 (pp. 804-808). International Speech Communication Association, 2017
Åpne denne publikasjonen i ny fane eller vindu >>Synthesising uncertainty: The interplay of vocal effort and hesitation disfluencies
2017 (engelsk)Inngår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association , 2017, Vol. 2017, s. 804-808Konferansepaper (Fagfellevurdert)
Abstract [en]

As synthetic voices become more flexible, and conversational systems gain more potential to adapt to the environmental and social situation, the question needs to be examined, how different modifications to the synthetic speech interact with each other and how their specific combinations influence perception. This work investigates how the vocal effort of the synthetic speech together with added disfluencies affect listeners' perception of the degree of uncertainty in an utterance. We introduce a DNN voice built entirely from spontaneous conversational speech data and capable of producing a continuum of vocal efforts, prolongations and filled pauses with a corpus-based method. Results of a listener evaluation indicate that decreased vocal effort, filled pauses and prolongation of function words increase the degree of perceived uncertainty of conversational utterances expressing the speaker's beliefs. We demonstrate that the effect of these three cues are not merely additive, but that interaction effects, in particular between the two types of disfluencies and between vocal effort and prolongations need to be considered when aiming to communicate a specific level of uncertainty. The implications of these findings are relevant for adaptive and incremental conversational systems using expressive speech synthesis and aspiring to communicate the attitude of uncertainty.

sted, utgiver, år, opplag, sider
International Speech Communication Association, 2017
Serie
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X
Emneord
Conversational Systems, Disfluencies, Speech Synthesis, Uncertainty, Vocal Effort
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-220749 (URN)10.21437/Interspeech.2017-1507 (DOI)2-s2.0-85039172286 (Scopus ID)
Konferanse
18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, Stockholm, Sweden, 20 August 2017 through 24 August 2017
Merknad

QC 20180105

Tilgjengelig fra: 2018-01-05 Laget: 2018-01-05 Sist oppdatert: 2018-01-05bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-0397-6442