Endre søk
Link to record
Permanent link

Direct link
Publikasjoner (10 av 26) Visa alla publikasjoner
Jonell, P. (2022). Scalable Methods for Developing Interlocutor-aware Embodied Conversational Agents: Data Collection, Behavior Modeling, and Evaluation Methods. (Doctoral dissertation). KTH Royal Institute of Technology
Åpne denne publikasjonen i ny fane eller vindu >>Scalable Methods for Developing Interlocutor-aware Embodied Conversational Agents: Data Collection, Behavior Modeling, and Evaluation Methods
2022 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

This work presents several methods, tools, and experiments that contribute to the development of interlocutor-aware Embodied Conversational Agents (ECAs). Interlocutor-aware ECAs take the interlocutor's behavior into consideration when generating their own non-verbal behaviors. This thesis targets the development of such adaptive ECAs by identifying and contributing to three important and related topics:

1) Data collection methods are presented, both for large scale crowdsourced data collection and in-lab data collection with a large number of sensors in a clinical setting. Experiments show that experts deemed dialog data collected using a crowdsourcing method to be better for dialog generation purposes than dialog data from other commonly used sources. 2) Methods for behavior modeling are presented, where machine learning models are used to generate facial gestures for ECAs. Both methods for single speaker and interlocutor-aware generation are presented. 3) Evaluation methods are explored and both third-party evaluation of generated gestures and interaction experiments of interlocutor-aware gestures generation are being discussed. For example, an experiment is carried out investigating the social influence of a mimicking social robot. Furthermore, a method for more efficient perceptual experiments is presented. This method is validated by replicating a previously conducted perceptual experiment on virtual agents, and shows that the results obtained using this new method provide similar insights (in fact, it provided more insights) into the data, simultaneously being more efficient in terms of time evaluators needed to spend participating in the experiment. A second study compared the difference between performing subjective evaluations of generated gestures in the lab vs. using crowdsourcing, and showed no difference between the two settings. A special focus in this thesis is given to using scalable methods, which allows for being able to efficiently and rapidly collect interaction data from a broad range of people and efficiently evaluate results produced by the machine learning methods. This in turn allows for fast iteration when developing interlocutor-aware ECAs behaviors.

Abstract [sv]

Det här arbetet presenterar ett flertal metoder, verktyg och experiment som alla bidrar till utvecklingen av motparts-medvetna förkloppsligade konversationella agenter, dvs agenter som kommunicerar med språk, har en kroppslig representation (avatar eller robot) och tar motpartens beteenden i beaktande när de genererar sina egna icke-verbala beteenden. Den här avhandlingen ämnar till att bidra till utvecklingen av sådana agenter genom att identifiera och bidra till tre viktiga områden:

Datainstamlingsmetoder  både för storskalig datainsamling med hjälp av så kallade "crowdworkers" (en stor mängd personer på internet som används för att lösa ett problem) men även i laboratoriemiljö med ett stort antal sensorer. Experiment presenteras som visar att t.ex. dialogdata som samlats in med hjälp av crowdworkers är bedömda som bättre ur dialoggenereringspersiktiv av en grupp experter än andra vanligt använda datamängder som används inom dialoggenerering. 2) Metoder för beteendemodellering, där maskininlärningsmodeller används för att generera ansiktsgester. Såväl metoder för att generera ansiktsgester för en ensam agent och för motparts-medvetna agenter presenteras, tillsammans med experiment som validerar deras funktionalitet. Vidare presenteras även ett experiment som undersöker en agents sociala påverkan på sin motpart då den imiterar ansiktsgester hos motparten medan de samtalar. 3) Evalueringsmetoder är utforskade och en metod för mer effektiva perceptuella experiment presenteras. Metoden är utvärderad genom att återskapa ett tidigare genomfört experiment med virtuella agenter, och visar att resultaten som fås med denna nya metod ger liknande insikter (den ger faktiskt fler insikter), samtidigt som den är effektivare när det kommer till hur mycket tid utvärderarna behövde spendera. En andra studie studerar skillnaden mellan att utföra subjektiva utvärderingar av genererade gester i en laboratoriemiljö jämfört med att använda crowdworkers, och visade att ingen skillnad kunde uppmätas. Ett speciellt fokus ligger på att använda skalbara metoder, då detta möjliggör effektiv och snabb insamling av mångfasetterad interaktionsdata från många olika människor samt evaluaring av de beteenden som genereras från maskininlärningsmodellerna, vilket i sin tur möjliggör snabb iterering i utvecklingen.

sted, utgiver, år, opplag, sider
KTH Royal Institute of Technology, 2022. s. 77
Serie
TRITA-EECS-AVL ; 2022:15
Emneord
non-verbal behavior generation, interlocutor-aware, data collection, behavior modeling, evaluation methods
HSV kategori
Forskningsprogram
Tal- och musikkommunikation
Identifikatorer
urn:nbn:se:kth:diva-309467 (URN)978-91-8040-151-7 (ISBN)
Disputas
2022-03-25, U1, https://kth-se.zoom.us/j/62813774919, Brinellvägen 26, Stockholm, 14:00 (engelsk)
Opponent
Veileder
Merknad

QC 20220307

Tilgjengelig fra: 2022-03-07 Laget: 2022-03-03 Sist oppdatert: 2022-06-25bibliografisk kontrollert
Kucherenko, T., Jonell, P., Yoon, Y., Wolfert, P. & Henter, G. E. (2021). A large, crowdsourced evaluation of gesture generation systems on common data: The GENEA Challenge 2020. In: Proceedings IUI '21: 26th International Conference on Intelligent User Interfaces: . Paper presented at IUI '21: 26th International Conference on Intelligent User Interfaces, College Station, TX, USA, April 13-17, 2021 (pp. 11-21). Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>A large, crowdsourced evaluation of gesture generation systems on common data: The GENEA Challenge 2020
Vise andre…
2021 (engelsk)Inngår i: Proceedings IUI '21: 26th International Conference on Intelligent User Interfaces, Association for Computing Machinery (ACM) , 2021, s. 11-21Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Co-speech gestures, gestures that accompany speech, play an important role in human communication. Automatic co-speech gesture generation is thus a key enabling technology for embodied conversational agents (ECAs), since humans expect ECAs to be capable of multi-modal communication. Research into gesture generation is rapidly gravitating towards data-driven methods. Unfortunately, individual research efforts in the field are difficult to compare: there are no established benchmarks, and each study tends to use its own dataset, motion visualisation, and evaluation methodology. To address this situation, we launched the GENEA Challenge, a gesture-generation challenge wherein participating teams built automatic gesture-generation systems on a common dataset, and the resulting systems were evaluated in parallel in a large, crowdsourced user study using the same motion-rendering pipeline. Since differences in evaluation outcomes between systems now are solely attributable to differences between the motion-generation methods, this enables benchmarking recent approaches against one another in order to get a better impression of the state of the art in the field. This paper reports on the purpose, design, results, and implications of our challenge.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2021
Emneord
gesture generation, conversational agents, evaluation paradigms
HSV kategori
Forskningsprogram
Människa-datorinteraktion
Identifikatorer
urn:nbn:se:kth:diva-296490 (URN)10.1145/3397481.3450692 (DOI)000747690200006 ()2-s2.0-85102546745 (Scopus ID)
Konferanse
IUI '21: 26th International Conference on Intelligent User Interfaces, College Station, TX, USA, April 13-17, 2021
Forskningsfinansiär
Swedish Foundation for Strategic Research , RIT15-0107Wallenberg AI, Autonomous Systems and Software Program (WASP)
Merknad

Part of Proceedings: ISBN 978-145038017-1

QC 20220303

Tilgjengelig fra: 2021-06-05 Laget: 2021-06-05 Sist oppdatert: 2022-06-25bibliografisk kontrollert
Kucherenko, T., Jonell, P., Yoon, Y., Wolfert, P., Yumak, Z. & Henter, G. E. (2021). GENEA Workshop 2021: The 2nd Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents. In: Proceedings of ICMI '21: International Conference on Multimodal Interaction: . Paper presented at ICMI '21: International Conference on Multimodal Interaction, Montréal, QC, Canada, October 18-22, 2021 (pp. 872-873). Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>GENEA Workshop 2021: The 2nd Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents
Vise andre…
2021 (engelsk)Inngår i: Proceedings of ICMI '21: International Conference on Multimodal Interaction, Association for Computing Machinery (ACM) , 2021, s. 872-873Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Embodied agents benefit from using non-verbal behavior when communicating with humans. Despite several decades of non-verbal behavior-generation research, there is currently no well-developed benchmarking culture in the field. For example, most researchers do not compare their outcomes with previous work, and if they do, they often do so in their own way which frequently is incompatible with others. With the GENEA Workshop 2021, we aim to bring the community together to discuss key challenges and solutions, and find the most appropriate ways to move the field forward.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2021
Emneord
behavior synthesis, datasets, evaluation, gesture generation, Behavior generation, Dataset, Embodied agent, Non-verbal behaviours, Behavioral research
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-313185 (URN)10.1145/3462244.3480983 (DOI)2-s2.0-85118969127 (Scopus ID)
Konferanse
ICMI '21: International Conference on Multimodal Interaction, Montréal, QC, Canada, October 18-22, 2021
Merknad

Part of proceedings ISBN 9781450384810

QC 20220602

Tilgjengelig fra: 2022-06-02 Laget: 2022-06-02 Sist oppdatert: 2022-06-25bibliografisk kontrollert
Jonell, P., Yoon, Y., Wolfert, P., Kucherenko, T. & Henter, G. E. (2021). HEMVIP: Human Evaluation of Multiple Videos in Parallel. In: ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction: . Paper presented at International Conference on Multimodal Interaction Montreal, Canada. October 18-22nd, 2021 (pp. 707-711). New York, NY, United States: Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>HEMVIP: Human Evaluation of Multiple Videos in Parallel
Vise andre…
2021 (engelsk)Inngår i: ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction, New York, NY, United States: Association for Computing Machinery (ACM) , 2021, s. 707-711Konferansepaper, Poster (with or without abstract) (Fagfellevurdert)
Abstract [en]

In many research areas, for example motion and gesture generation, objective measures alone do not provide an accurate impression of key stimulus traits such as perceived quality or appropriateness. The gold standard is instead to evaluate these aspects through user studies, especially subjective evaluations of video stimuli. Common evaluation paradigms either present individual stimuli to be scored on Likert-type scales, or ask users to compare and rate videos in a pairwise fashion. However, the time and resources required for such evaluations scale poorly as the number of conditions to be compared increases. Building on standards used for evaluating the quality of multimedia codecs, this paper instead introduces a framework for granular rating of multiple comparable videos in parallel. This methodology essentially analyses all condition pairs at once. Our contributions are 1) a proposed framework, called HEMVIP, for parallel and granular evaluation of multiple video stimuli and 2) a validation study confirming that results obtained using the tool are in close agreement with results of prior studies using conventional multiple pairwise comparisons.

sted, utgiver, år, opplag, sider
New York, NY, United States: Association for Computing Machinery (ACM), 2021
Emneord
evaluation paradigms, video evaluation, conversational agents, gesture generation
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-309462 (URN)10.1145/3462244.3479957 (DOI)2-s2.0-85113672097 (Scopus ID)
Konferanse
International Conference on Multimodal Interaction Montreal, Canada. October 18-22nd, 2021
Forskningsfinansiär
Swedish Foundation for Strategic Research, RIT15-0107Wallenberg AI, Autonomous Systems and Software Program (WASP)
Merknad

Part of proceedings: ISBN 978-1-4503-8481-0

QC 20220309

Tilgjengelig fra: 2022-03-03 Laget: 2022-03-03 Sist oppdatert: 2023-01-18bibliografisk kontrollert
Jonell, P., Deichler, A., Torre, I., Leite, I. & Beskow, J. (2021). Mechanical Chameleons: Evaluating the effects of a social robot’snon-verbal behavior on social influence. In: Proceedings of SCRITA 2021, a workshop at IEEE RO-MAN 2021: . Paper presented at Trust, Acceptance and Social Cues in Human-Robot Interaction - SCRITA, 12 August, 2021.
Åpne denne publikasjonen i ny fane eller vindu >>Mechanical Chameleons: Evaluating the effects of a social robot’snon-verbal behavior on social influence
Vise andre…
2021 (engelsk)Inngår i: Proceedings of SCRITA 2021, a workshop at IEEE RO-MAN 2021, 2021Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In this paper we present a pilot study which investigates how non-verbal behavior affects social influence in social robots. We also present a modular system which is capable of controlling the non-verbal behavior based on the interlocutor's facial gestures (head movements and facial expressions) in real time, and a study investigating whether three different strategies for facial gestures ("still", "natural movement", i.e. movements recorded from another conversation, and "copy", i.e. mimicking the user with a four second delay) has any affect on social influence and decision making in a "survival task". Our preliminary results show there was no significant difference between the three conditions, but this might be due to among other things a low number of study participants (12). 

HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-309464 (URN)
Konferanse
Trust, Acceptance and Social Cues in Human-Robot Interaction - SCRITA, 12 August, 2021
Forskningsfinansiär
Swedish Foundation for Strategic Research , RIT15-0107Swedish Research Council, 2018-05409
Merknad

QC 20220308

Tilgjengelig fra: 2022-03-03 Laget: 2022-03-03 Sist oppdatert: 2022-06-25bibliografisk kontrollert
Jonell, P., Moell, B., Håkansson, K., Henter, G. E., Kucherenko, T., Mikheeva, O., . . . Beskow, J. (2021). Multimodal Capture of Patient Behaviour for Improved Detection of Early Dementia: Clinical Feasibility and Preliminary Results. Frontiers in Computer Science, 3, Article ID 642633.
Åpne denne publikasjonen i ny fane eller vindu >>Multimodal Capture of Patient Behaviour for Improved Detection of Early Dementia: Clinical Feasibility and Preliminary Results
Vise andre…
2021 (engelsk)Inngår i: Frontiers in Computer Science, E-ISSN 2624-9898, Vol. 3, artikkel-id 642633Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Non-invasive automatic screening for Alzheimer's disease has the potential to improve diagnostic accuracy while lowering healthcare costs. Previous research has shown that patterns in speech, language, gaze, and drawing can help detect early signs of cognitive decline. In this paper, we describe a highly multimodal system for unobtrusively capturing data during real clinical interviews conducted as part of cognitive assessments for Alzheimer's disease. The system uses nine different sensor devices (smartphones, a tablet, an eye tracker, a microphone array, and a wristband) to record interaction data during a specialist's first clinical interview with a patient, and is currently in use at Karolinska University Hospital in Stockholm, Sweden. Furthermore, complementary information in the form of brain imaging, psychological tests, speech therapist assessment, and clinical meta-data is also available for each patient. We detail our data-collection and analysis procedure and present preliminary findings that relate measures extracted from the multimodal recordings to clinical assessments and established biomarkers, based on data from 25 patients gathered thus far. Our findings demonstrate feasibility for our proposed methodology and indicate that the collected data can be used to improve clinical assessments of early dementia.

sted, utgiver, år, opplag, sider
Frontiers Media SA, 2021
Emneord
Alzheimer, mild cognitive impairment, multimodal prediction, speech, gaze, pupil dilation, thermal camera, pen motion
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-303883 (URN)10.3389/fcomp.2021.642633 (DOI)000705498300001 ()2-s2.0-85115692731 (Scopus ID)
Merknad

QC 20211022

Tilgjengelig fra: 2021-10-22 Laget: 2021-10-22 Sist oppdatert: 2025-02-07bibliografisk kontrollert
Kucherenko, T., Nagy, R., Jonell, P., Neff, M., Kjellström, H. & Henter, G. E. (2021). Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech. In: IVA '21: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents. Paper presented at 21st ACM International Conference on Intelligent Virtual Agents, IVA 2021Virtual, Online14 September 2021 through 17 September 2021, University of Fukuchiyama, Fukuchiyama City, Kyoto, Japan (pp. 145-147). Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech
Vise andre…
2021 (engelsk)Inngår i: IVA '21: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents, Association for Computing Machinery (ACM) , 2021, s. 145-147Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

We propose a new framework for gesture generation, aiming to allow data-driven approaches to produce more semantically rich gestures. Our approach first predicts whether to gesture, followed by a prediction of the gesture properties. Those properties are then used as conditioning for a modern probabilistic gesture-generation model capable of high-quality output. This empowers the approach to generate gestures that are both diverse and representational. Follow-ups and more information can be found on the project page:https://svito-zar.github.io/speech2properties2gestures

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2021
Emneord
gesture generation, virtual agents, representational gestures
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-302667 (URN)10.1145/3472306.3478333 (DOI)000728149900023 ()2-s2.0-85113524837 (Scopus ID)
Konferanse
21st ACM International Conference on Intelligent Virtual Agents, IVA 2021Virtual, Online14 September 2021 through 17 September 2021, University of Fukuchiyama, Fukuchiyama City, Kyoto, Japan
Forskningsfinansiär
Swedish Foundation for Strategic Research , RIT15-0107Wallenberg AI, Autonomous Systems and Software Program (WASP)
Merknad

QC 20211102

Part of Proceedings: ISBN 9781450386197

Tilgjengelig fra: 2021-09-28 Laget: 2021-09-28 Sist oppdatert: 2022-06-25bibliografisk kontrollert
Oertel, C., Jonell, P., Kontogiorgos, D., Mora, K. F., Odobez, J.-M. & Gustafsson, J. (2021). Towards an Engagement-Aware Attentive Artificial Listener for Multi-Party Interactions. Frontiers in Robotics and AI, 8, Article ID 555913.
Åpne denne publikasjonen i ny fane eller vindu >>Towards an Engagement-Aware Attentive Artificial Listener for Multi-Party Interactions
Vise andre…
2021 (engelsk)Inngår i: Frontiers in Robotics and AI, E-ISSN 2296-9144, Vol. 8, artikkel-id 555913Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Listening to one another is essential to human-human interaction. In fact, we humans spend a substantial part of our day listening to other people, in private as well as in work settings. Attentive listening serves the function to gather information for oneself, but at the same time, it also signals to the speaker that he/she is being heard. To deduce whether our interlocutor is listening to us, we are relying on reading his/her nonverbal cues, very much like how we also use non-verbal cues to signal our attention. Such signaling becomes more complex when we move from dyadic to multi-party interactions. Understanding how humans use nonverbal cues in a multi-party listening context not only increases our understanding of human-human communication but also aids the development of successful human-robot interactions. This paper aims to bring together previous analyses of listener behavior analyses in human-human multi-party interaction and provide novel insights into gaze patterns between the listeners in particular. We are investigating whether the gaze patterns and feedback behavior, as observed in the humanhuman dialogue, are also beneficial for the perception of a robot in multi-party humanrobot interaction. To answer this question, we are implementing an attentive listening system that generates multi-modal listening behavior based on our human-human analysis. We are comparing our system to a baseline system that does not differentiate between different listener types in its behavior generation. We are evaluating it in terms of the participant's perception of the robot, his behavior as well as the perception of third-party observers.

sted, utgiver, år, opplag, sider
Frontiers Media SA, 2021
Emneord
multi-party interactions, non-verbal behaviors, eye-gaze patterns, head gestures, human-robot interaction, artificial listener, social signal processing
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-299298 (URN)10.3389/frobt.2021.555913 (DOI)000673604300001 ()34277714 (PubMedID)2-s2.0-85110106028 (Scopus ID)
Merknad

QC 20220301

Tilgjengelig fra: 2021-08-18 Laget: 2021-08-18 Sist oppdatert: 2022-06-25bibliografisk kontrollert
Jonell, P., Kucherenko, T., Torre, I. & Beskow, J. (2020). Can we trust online crowdworkers? : Comparing online and offline participants in a preference test of virtual agents.. In: IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents: . Paper presented at IVA '20: ACM International Conference on Intelligent Virtual Agents, Virtual Event, Scotland, UK, October 20-22, 2020. Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>Can we trust online crowdworkers? : Comparing online and offline participants in a preference test of virtual agents.
2020 (engelsk)Inngår i: IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, Association for Computing Machinery (ACM) , 2020Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Conducting user studies is a crucial component in many scientific fields. While some studies require participants to be physically present, other studies can be conducted both physically (e.g. in-lab)and online (e.g. via crowdsourcing). Inviting participants to the lab can be a time-consuming and logistically difficult endeavor, not to mention that sometimes research groups might not be able to run in-lab experiments, because of, for example, a pandemic. Crowd-sourcing platforms such as Amazon Mechanical Turk (AMT) or prolific can therefore be a suitable alternative to run certain experiments, such as evaluating virtual agents. Although previous studies investigated the use of crowdsourcing platforms for running experiments, there is still uncertainty as to whether the results are reliable for perceptual studies. Here we replicate a previous experiment where participants evaluated a gesture generation model for virtual agents. The experiment is conducted across three participant poolsś in-lab, Prolific, andAMTś having similar demographics across the in-lab participants and the Prolific platform. Our results show no difference between the three participant pools in regards to their evaluations of the gesture generation models and their reliability scores. The results indicate that online platforms can successfully be used for perceptual evaluations of this kind.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2020
Emneord
user studies, online participants, attentiveness
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-290562 (URN)10.1145/3383652.3423860 (DOI)000728153600002 ()2-s2.0-85096979963 (Scopus ID)
Konferanse
IVA '20: ACM International Conference on Intelligent Virtual Agents, Virtual Event, Scotland, UK, October 20-22, 2020
Forskningsfinansiär
Swedish Foundation for Strategic Research , RIT15-0107Wallenberg AI, Autonomous Systems and Software Program (WASP), CorSA
Merknad

OQ 20211109

Part of Proceedings: ISBN 978-145037586-3

Taras Kucherenko and Patrik Jonell contributed equally to this research.

Tilgjengelig fra: 2021-02-18 Laget: 2021-02-18 Sist oppdatert: 2022-06-25bibliografisk kontrollert
Cohn, M., Jonell, P., Kim, T., Beskow, J. & Zellou, G. (2020). Embodiment and gender interact in alignment to TTS voices. In: Proceedings for the 42nd Annual Meeting of the Cognitive Science Society: Developing a Mind: Learning in Humans, Animals, and Machines, CogSci 2020. Paper presented at 42nd Annual Meeting of the Cognitive Science Society: Developing a Mind: Learning in Humans, Animals, and Machines, CogSci 2020, 29 July 2020 through 1 August 2020 (pp. 220-226). The Cognitive Science Society
Åpne denne publikasjonen i ny fane eller vindu >>Embodiment and gender interact in alignment to TTS voices
Vise andre…
2020 (engelsk)Inngår i: Proceedings for the 42nd Annual Meeting of the Cognitive Science Society: Developing a Mind: Learning in Humans, Animals, and Machines, CogSci 2020, The Cognitive Science Society , 2020, s. 220-226Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The current study tests subjects' vocal alignment toward female and male text-to-speech (TTS) voices presented via three systems: Amazon Echo, Nao, and Furhat. These systems vary in their physical form, ranging from a cylindrical speaker (Echo), to a small robot (Nao), to a human-like robot bust (Furhat). We test whether this cline of personification (cylinder < mini robot < human-like robot bust) predicts patterns of gender-mediated vocal alignment. In addition to comparing multiple systems, this study addresses a confound in many prior vocal alignment studies by using identical voices across the systems. Results show evidence for a cline of personification toward female TTS voices by female shadowers (Echo < Nao < Furhat) and a more categorical effect of device personification for male TTS voices by male shadowers (Echo < Nao, Furhat). These findings are discussed in terms of their implications for models of device-human interaction and theories of computer personification. 

sted, utgiver, år, opplag, sider
The Cognitive Science Society, 2020
Emneord
embodiment, gender, human-device interaction, text-to-speech, vocal alignment, Human computer interaction, Human robot interaction, 'current, Human like robots, Minirobots, Robot Nao, Small robots, Text to speech, Alignment
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-323807 (URN)2-s2.0-85129583974 (Scopus ID)
Konferanse
42nd Annual Meeting of the Cognitive Science Society: Developing a Mind: Learning in Humans, Animals, and Machines, CogSci 2020, 29 July 2020 through 1 August 2020
Merknad

QC 20230213

Tilgjengelig fra: 2023-02-13 Laget: 2023-02-13 Sist oppdatert: 2025-02-07bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0003-3687-6189