kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Can we trust online crowdworkers? : Comparing online and offline participants in a preference test of virtual agents.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-3687-6189
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0001-9838-8848
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-8601-1370
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1399-6604
2020 (English)In: IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, Association for Computing Machinery (ACM) , 2020Conference paper, Published paper (Refereed)
Abstract [en]

Conducting user studies is a crucial component in many scientific fields. While some studies require participants to be physically present, other studies can be conducted both physically (e.g. in-lab)and online (e.g. via crowdsourcing). Inviting participants to the lab can be a time-consuming and logistically difficult endeavor, not to mention that sometimes research groups might not be able to run in-lab experiments, because of, for example, a pandemic. Crowd-sourcing platforms such as Amazon Mechanical Turk (AMT) or prolific can therefore be a suitable alternative to run certain experiments, such as evaluating virtual agents. Although previous studies investigated the use of crowdsourcing platforms for running experiments, there is still uncertainty as to whether the results are reliable for perceptual studies. Here we replicate a previous experiment where participants evaluated a gesture generation model for virtual agents. The experiment is conducted across three participant poolsś in-lab, Prolific, andAMTś having similar demographics across the in-lab participants and the Prolific platform. Our results show no difference between the three participant pools in regards to their evaluations of the gesture generation models and their reliability scores. The results indicate that online platforms can successfully be used for perceptual evaluations of this kind.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM) , 2020.
Keywords [en]
user studies, online participants, attentiveness
National Category
Human Computer Interaction
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-290562DOI: 10.1145/3383652.3423860ISI: 000728153600002Scopus ID: 2-s2.0-85096979963OAI: oai:DiVA.org:kth-290562DiVA, id: diva2:1529574
Conference
IVA '20: ACM International Conference on Intelligent Virtual Agents, Virtual Event, Scotland, UK, October 20-22, 2020
Funder
Swedish Foundation for Strategic Research , RIT15-0107Wallenberg AI, Autonomous Systems and Software Program (WASP), CorSA
Note

OQ 20211109

Part of Proceedings: ISBN 978-145037586-3

Taras Kucherenko and Patrik Jonell contributed equally to this research.

Available from: 2021-02-18 Created: 2021-02-18 Last updated: 2022-06-25Bibliographically approved
In thesis
1. Developing and evaluating co-speech gesture-synthesis models for embodied conversational agents
Open this publication in new window or tab >>Developing and evaluating co-speech gesture-synthesis models for embodied conversational agents
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

 A  large part of our communication is non-verbal:   humans use non-verbal behaviors to express various aspects of our state or intent.  Embodied artificial agents, such as virtual avatars or robots, should also use non-verbal behavior for efficient and pleasant interaction. A core part of non-verbal communication is gesticulation:  gestures communicate a large share of non-verbal content. For example, around 90\% of spoken utterances in descriptive discourse are accompanied by gestures. Since gestures are important, generating co-speech gestures has been an essential task in the Human-Agent Interaction (HAI) and Computer Graphics communities for several decades.  Evaluating the gesture-generating methods has been an equally important and equally challenging part of field development. Consequently, this thesis contributes to both the development and evaluation of gesture-generation models. 

This thesis proposes three deep-learning-based gesture-generation models. The first model is deterministic and uses only audio and generates only beat gestures.  The second model is deterministic and uses both audio and text, aiming to generate meaningful gestures.  A final model uses both audio and text and is probabilistic to learn the stochastic character of human gesticulation.  The methods have applications to both virtual agents and social robots. Individual research efforts in the field of gesture generation are difficult to compare, as there are no established benchmarks.  To address this situation, my colleagues and I launched the first-ever gesture-generation challenge, which we called the GENEA Challenge.  We have also investigated if online participants are as attentive as offline participants and found that they are both equally attentive provided that they are well paid.   Finally,  we developed a  system that integrates co-speech gesture-generation models into a real-time interactive embodied conversational agent.  This system is intended to facilitate the evaluation of modern gesture generation models in interaction. 

To further advance the development of capable gesture-generation methods, we need to advance their evaluation, and the research in the thesis supports an interpretation that evaluation is the main bottleneck that limits the field.  There are currently no comprehensive co-speech gesture datasets, which should be large, high-quality, and diverse. In addition, no strong objective metrics are yet available.  Creating speech-gesture datasets and developing objective metrics are highlighted as essential next steps for further field development.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2021. p. 47
Series
TRITA-EECS-AVL ; 2021:75
Keywords
Human-agent interaction, gesture generation, social robotics, conversational agents, non-verbal behavior, deep learning, machine learning
National Category
Human Computer Interaction
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-304618 (URN)978-91-8040-058-9 (ISBN)
Public defence
2021-12-07, Sal Kollegiesalen, Stockholm, 13:00 (English)
Opponent
Supervisors
Funder
Swedish Foundation for Strategic Research , RIT15-0107
Note

QC 20211109

Available from: 2021-11-10 Created: 2021-11-08 Last updated: 2022-06-25Bibliographically approved
2. Scalable Methods for Developing Interlocutor-aware Embodied Conversational Agents: Data Collection, Behavior Modeling, and Evaluation Methods
Open this publication in new window or tab >>Scalable Methods for Developing Interlocutor-aware Embodied Conversational Agents: Data Collection, Behavior Modeling, and Evaluation Methods
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This work presents several methods, tools, and experiments that contribute to the development of interlocutor-aware Embodied Conversational Agents (ECAs). Interlocutor-aware ECAs take the interlocutor's behavior into consideration when generating their own non-verbal behaviors. This thesis targets the development of such adaptive ECAs by identifying and contributing to three important and related topics:

1) Data collection methods are presented, both for large scale crowdsourced data collection and in-lab data collection with a large number of sensors in a clinical setting. Experiments show that experts deemed dialog data collected using a crowdsourcing method to be better for dialog generation purposes than dialog data from other commonly used sources. 2) Methods for behavior modeling are presented, where machine learning models are used to generate facial gestures for ECAs. Both methods for single speaker and interlocutor-aware generation are presented. 3) Evaluation methods are explored and both third-party evaluation of generated gestures and interaction experiments of interlocutor-aware gestures generation are being discussed. For example, an experiment is carried out investigating the social influence of a mimicking social robot. Furthermore, a method for more efficient perceptual experiments is presented. This method is validated by replicating a previously conducted perceptual experiment on virtual agents, and shows that the results obtained using this new method provide similar insights (in fact, it provided more insights) into the data, simultaneously being more efficient in terms of time evaluators needed to spend participating in the experiment. A second study compared the difference between performing subjective evaluations of generated gestures in the lab vs. using crowdsourcing, and showed no difference between the two settings. A special focus in this thesis is given to using scalable methods, which allows for being able to efficiently and rapidly collect interaction data from a broad range of people and efficiently evaluate results produced by the machine learning methods. This in turn allows for fast iteration when developing interlocutor-aware ECAs behaviors.

Abstract [sv]

Det här arbetet presenterar ett flertal metoder, verktyg och experiment som alla bidrar till utvecklingen av motparts-medvetna förkloppsligade konversationella agenter, dvs agenter som kommunicerar med språk, har en kroppslig representation (avatar eller robot) och tar motpartens beteenden i beaktande när de genererar sina egna icke-verbala beteenden. Den här avhandlingen ämnar till att bidra till utvecklingen av sådana agenter genom att identifiera och bidra till tre viktiga områden:

Datainstamlingsmetoder  både för storskalig datainsamling med hjälp av så kallade "crowdworkers" (en stor mängd personer på internet som används för att lösa ett problem) men även i laboratoriemiljö med ett stort antal sensorer. Experiment presenteras som visar att t.ex. dialogdata som samlats in med hjälp av crowdworkers är bedömda som bättre ur dialoggenereringspersiktiv av en grupp experter än andra vanligt använda datamängder som används inom dialoggenerering. 2) Metoder för beteendemodellering, där maskininlärningsmodeller används för att generera ansiktsgester. Såväl metoder för att generera ansiktsgester för en ensam agent och för motparts-medvetna agenter presenteras, tillsammans med experiment som validerar deras funktionalitet. Vidare presenteras även ett experiment som undersöker en agents sociala påverkan på sin motpart då den imiterar ansiktsgester hos motparten medan de samtalar. 3) Evalueringsmetoder är utforskade och en metod för mer effektiva perceptuella experiment presenteras. Metoden är utvärderad genom att återskapa ett tidigare genomfört experiment med virtuella agenter, och visar att resultaten som fås med denna nya metod ger liknande insikter (den ger faktiskt fler insikter), samtidigt som den är effektivare när det kommer till hur mycket tid utvärderarna behövde spendera. En andra studie studerar skillnaden mellan att utföra subjektiva utvärderingar av genererade gester i en laboratoriemiljö jämfört med att använda crowdworkers, och visade att ingen skillnad kunde uppmätas. Ett speciellt fokus ligger på att använda skalbara metoder, då detta möjliggör effektiv och snabb insamling av mångfasetterad interaktionsdata från många olika människor samt evaluaring av de beteenden som genereras från maskininlärningsmodellerna, vilket i sin tur möjliggör snabb iterering i utvecklingen.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2022. p. 77
Series
TRITA-EECS-AVL ; 2022:15
Keywords
non-verbal behavior generation, interlocutor-aware, data collection, behavior modeling, evaluation methods
National Category
Computer Systems
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-309467 (URN)978-91-8040-151-7 (ISBN)
Public defence
2022-03-25, U1, https://kth-se.zoom.us/j/62813774919, Brinellvägen 26, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20220307

Available from: 2022-03-07 Created: 2022-03-03 Last updated: 2022-06-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Jonell, PatrikKucherenko, TarasTorre, IlariaBeskow, Jonas

Search in DiVA

By author/editor
Jonell, PatrikKucherenko, TarasTorre, IlariaBeskow, Jonas
By organisation
Speech, Music and Hearing, TMHRobotics, Perception and Learning, RPL
Human Computer Interaction

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 135 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf