kth.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Scalable Methods for Developing Interlocutor-aware Embodied Conversational Agents: Data Collection, Behavior Modeling, and Evaluation Methods
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0003-3687-6189
2022 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

This work presents several methods, tools, and experiments that contribute to the development of interlocutor-aware Embodied Conversational Agents (ECAs). Interlocutor-aware ECAs take the interlocutor's behavior into consideration when generating their own non-verbal behaviors. This thesis targets the development of such adaptive ECAs by identifying and contributing to three important and related topics:

1) Data collection methods are presented, both for large scale crowdsourced data collection and in-lab data collection with a large number of sensors in a clinical setting. Experiments show that experts deemed dialog data collected using a crowdsourcing method to be better for dialog generation purposes than dialog data from other commonly used sources. 2) Methods for behavior modeling are presented, where machine learning models are used to generate facial gestures for ECAs. Both methods for single speaker and interlocutor-aware generation are presented. 3) Evaluation methods are explored and both third-party evaluation of generated gestures and interaction experiments of interlocutor-aware gestures generation are being discussed. For example, an experiment is carried out investigating the social influence of a mimicking social robot. Furthermore, a method for more efficient perceptual experiments is presented. This method is validated by replicating a previously conducted perceptual experiment on virtual agents, and shows that the results obtained using this new method provide similar insights (in fact, it provided more insights) into the data, simultaneously being more efficient in terms of time evaluators needed to spend participating in the experiment. A second study compared the difference between performing subjective evaluations of generated gestures in the lab vs. using crowdsourcing, and showed no difference between the two settings. A special focus in this thesis is given to using scalable methods, which allows for being able to efficiently and rapidly collect interaction data from a broad range of people and efficiently evaluate results produced by the machine learning methods. This in turn allows for fast iteration when developing interlocutor-aware ECAs behaviors.

Abstract [sv]

Det här arbetet presenterar ett flertal metoder, verktyg och experiment som alla bidrar till utvecklingen av motparts-medvetna förkloppsligade konversationella agenter, dvs agenter som kommunicerar med språk, har en kroppslig representation (avatar eller robot) och tar motpartens beteenden i beaktande när de genererar sina egna icke-verbala beteenden. Den här avhandlingen ämnar till att bidra till utvecklingen av sådana agenter genom att identifiera och bidra till tre viktiga områden:

Datainstamlingsmetoder  både för storskalig datainsamling med hjälp av så kallade "crowdworkers" (en stor mängd personer på internet som används för att lösa ett problem) men även i laboratoriemiljö med ett stort antal sensorer. Experiment presenteras som visar att t.ex. dialogdata som samlats in med hjälp av crowdworkers är bedömda som bättre ur dialoggenereringspersiktiv av en grupp experter än andra vanligt använda datamängder som används inom dialoggenerering. 2) Metoder för beteendemodellering, där maskininlärningsmodeller används för att generera ansiktsgester. Såväl metoder för att generera ansiktsgester för en ensam agent och för motparts-medvetna agenter presenteras, tillsammans med experiment som validerar deras funktionalitet. Vidare presenteras även ett experiment som undersöker en agents sociala påverkan på sin motpart då den imiterar ansiktsgester hos motparten medan de samtalar. 3) Evalueringsmetoder är utforskade och en metod för mer effektiva perceptuella experiment presenteras. Metoden är utvärderad genom att återskapa ett tidigare genomfört experiment med virtuella agenter, och visar att resultaten som fås med denna nya metod ger liknande insikter (den ger faktiskt fler insikter), samtidigt som den är effektivare när det kommer till hur mycket tid utvärderarna behövde spendera. En andra studie studerar skillnaden mellan att utföra subjektiva utvärderingar av genererade gester i en laboratoriemiljö jämfört med att använda crowdworkers, och visade att ingen skillnad kunde uppmätas. Ett speciellt fokus ligger på att använda skalbara metoder, då detta möjliggör effektiv och snabb insamling av mångfasetterad interaktionsdata från många olika människor samt evaluaring av de beteenden som genereras från maskininlärningsmodellerna, vilket i sin tur möjliggör snabb iterering i utvecklingen.

Ort, förlag, år, upplaga, sidor
KTH Royal Institute of Technology, 2022. , s. 77
Serie
TRITA-EECS-AVL ; 2022:15
Nyckelord [en]
non-verbal behavior generation, interlocutor-aware, data collection, behavior modeling, evaluation methods
Nationell ämneskategori
Datorsystem
Forskningsämne
Tal- och musikkommunikation
Identifikatorer
URN: urn:nbn:se:kth:diva-309467ISBN: 978-91-8040-151-7 (tryckt)OAI: oai:DiVA.org:kth-309467DiVA, id: diva2:1642118
Disputation
2022-03-25, U1, https://kth-se.zoom.us/j/62813774919, Brinellvägen 26, Stockholm, 14:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20220307

Tillgänglig från: 2022-03-07 Skapad: 2022-03-03 Senast uppdaterad: 2022-06-25Bibliografiskt granskad
Delarbeten
1. Crowdsourced Multimodal Corpora Collection Tool
Öppna denna publikation i ny flik eller fönster >>Crowdsourced Multimodal Corpora Collection Tool
Visa övriga...
2018 (Engelska)Ingår i: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris, 2018, s. 728-734Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In recent years, more and more multimodal corpora have been created. To our knowledge there is no publicly available tool which allows for acquiring controlled multimodal data of people in a rapid and scalable fashion. We therefore are proposing (1) a novel tool which will enable researchers to rapidly gather large amounts of multimodal data spanning a wide demographic range, and (2) an example of how we used this tool for corpus collection of our "Attentive listener'' multimodal corpus. The code is released under an Apache License 2.0 and available as an open-source repository, which can be found at https://github.com/kth-social-robotics/multimodal-crowdsourcing-tool. This tool will allow researchers to set-up their own multimodal data collection system quickly and create their own multimodal corpora. Finally, this paper provides a discussion about the advantages and disadvantages with a crowd-sourced data collection tool, especially in comparison to a lab recorded corpora.

Ort, förlag, år, upplaga, sidor
Paris: , 2018
Nationell ämneskategori
Teknik och teknologier
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-230236 (URN)000725545000117 ()2-s2.0-85059908776 (Scopus ID)
Konferens
The Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Anmärkning

Part of proceedings ISBN 979-10-95546-00-9

QC 20180618

Tillgänglig från: 2018-06-13 Skapad: 2018-06-13 Senast uppdaterad: 2022-11-09Bibliografiskt granskad
2. Crowdsourcing a self-evolving dialog graph
Öppna denna publikation i ny flik eller fönster >>Crowdsourcing a self-evolving dialog graph
Visa övriga...
2019 (Engelska)Ingår i: CUI '19: Proceedings of the 1st International Conference on Conversational User Interfaces, Association for Computing Machinery (ACM), 2019, artikel-id 14Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In this paper we present a crowdsourcing-based approach for collecting dialog data for a social chat dialog system, which gradually builds a dialog graph from actual user responses and crowd-sourced system answers, conditioned by a given persona and other instructions. This approach was tested during the second instalment of the Amazon Alexa Prize 2018 (AP2018), both for the data collection and to feed a simple dialog system which would use the graph to provide answers. As users interacted with the system, a graph which maintained the structure of the dialogs was built, identifying parts where more coverage was needed. In an ofine evaluation, we have compared the corpus collected during the competition with other potential corpora for training chatbots, including movie subtitles, online chat forums and conversational data. The results show that the proposed methodology creates data that is more representative of actual user utterances, and leads to more coherent and engaging answers from the agent. An implementation of the proposed method is available as open-source code.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2019
Serie
ACM International Conference Proceeding Series
Nyckelord
Crowdsourcing, Datasets, Dialog systems, Human-computer interaction
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
urn:nbn:se:kth:diva-266061 (URN)10.1145/3342775.3342790 (DOI)000525446900014 ()2-s2.0-85075882531 (Scopus ID)
Konferens
1st International Conference on Conversational User Interfaces, CUI 2019; Dublin; Ireland; 22 August 2019 through 23 August 2019
Anmärkning

QC 20200114

Part of ISBN 9781450371872

Tillgänglig från: 2020-01-14 Skapad: 2020-01-14 Senast uppdaterad: 2024-10-18Bibliografiskt granskad
3. Multimodal Capture of Patient Behaviour for Improved Detection of Early Dementia: Clinical Feasibility and Preliminary Results
Öppna denna publikation i ny flik eller fönster >>Multimodal Capture of Patient Behaviour for Improved Detection of Early Dementia: Clinical Feasibility and Preliminary Results
Visa övriga...
2021 (Engelska)Ingår i: Frontiers in Computer Science, E-ISSN 2624-9898, Vol. 3, artikel-id 642633Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Non-invasive automatic screening for Alzheimer's disease has the potential to improve diagnostic accuracy while lowering healthcare costs. Previous research has shown that patterns in speech, language, gaze, and drawing can help detect early signs of cognitive decline. In this paper, we describe a highly multimodal system for unobtrusively capturing data during real clinical interviews conducted as part of cognitive assessments for Alzheimer's disease. The system uses nine different sensor devices (smartphones, a tablet, an eye tracker, a microphone array, and a wristband) to record interaction data during a specialist's first clinical interview with a patient, and is currently in use at Karolinska University Hospital in Stockholm, Sweden. Furthermore, complementary information in the form of brain imaging, psychological tests, speech therapist assessment, and clinical meta-data is also available for each patient. We detail our data-collection and analysis procedure and present preliminary findings that relate measures extracted from the multimodal recordings to clinical assessments and established biomarkers, based on data from 25 patients gathered thus far. Our findings demonstrate feasibility for our proposed methodology and indicate that the collected data can be used to improve clinical assessments of early dementia.

Ort, förlag, år, upplaga, sidor
Frontiers Media SA, 2021
Nyckelord
Alzheimer, mild cognitive impairment, multimodal prediction, speech, gaze, pupil dilation, thermal camera, pen motion
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
urn:nbn:se:kth:diva-303883 (URN)10.3389/fcomp.2021.642633 (DOI)000705498300001 ()2-s2.0-85115692731 (Scopus ID)
Anmärkning

QC 20211022

Tillgänglig från: 2021-10-22 Skapad: 2021-10-22 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
4. Learning Non-verbal Behavior for a Social Robot from YouTube Videos
Öppna denna publikation i ny flik eller fönster >>Learning Non-verbal Behavior for a Social Robot from YouTube Videos
2019 (Engelska)Konferensbidrag, Poster (med eller utan abstract) (Refereegranskat)
Abstract [en]

Non-verbal behavior is crucial for positive perception of humanoid robots. If modeled well it can improve the interaction and leave the user with a positive experience, on the other hand, if it is modelled poorly it may impede the interaction and become a source of distraction. Most of the existing work on modeling non-verbal behavior show limited variability due to the fact that the models employed are deterministic and the generated motion can be perceived as repetitive and predictable. In this paper, we present a novel method for generation of a limited set of facial expressions and head movements, based on a probabilistic generative deep learning architecture called Glow. We have implemented a workflow which takes videos directly from YouTube, extracts relevant features, and trains a model that generates gestures that can be realized in a robot without any post processing. A user study was conducted and illustrated the importance of having any kind of non-verbal behavior while most differences between the ground truth, the proposed method, and a random control were not significant (however, the differences that were significant were in favor of the proposed method).

Nyckelord
Facial expressions, non-verbal behavior, generative models, neural network, head movement, social robotics
Nationell ämneskategori
Datorsystem
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-261242 (URN)
Konferens
ICDL-EpiRob Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions, Oslo, Norway, August 19, 2019
Forskningsfinansiär
Stiftelsen för strategisk forskning (SSF), RIT15-0107
Anmärkning

QC 20191007

Tillgänglig från: 2019-10-03 Skapad: 2019-10-03 Senast uppdaterad: 2024-03-18Bibliografiskt granskad
5. Let’s face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings
Öppna denna publikation i ny flik eller fönster >>Let’s face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings
2020 (Engelska)Ingår i: IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, Association for Computing Machinery (ACM), 2020Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

To enable more natural face-to-face interactions, conversational agents need to adapt their behavior to their interlocutors. One key aspect of this is generation of appropriate non-verbal behavior for the agent, for example, facial gestures, here defined as facial expressions and head movements. Most existing gesture-generating systems do not utilize multi-modal cues from the interlocutor when synthesizing non-verbal behavior. Those that do, typically use deterministic methods that risk producing repetitive and non-vivid motions. In this paper, we introduce a probabilistic method to synthesize interlocutor-aware facial gestures ś represented by highly expressive FLAME parameters ś in dyadic conversations. Our contributions are: a) a method for feature extraction from multi-party video and speech recordings, resulting in a representation that allows for independent control and manipulation of expression and speech articulation in a 3D avatar; b) an extension to MoGlow, a recent motion-synthesis method based on normalizing flows, to also take multi-modal signals from the interlocutor as input and subsequently output interlocutor-aware facial gestures; and c) a subjective evaluation assessing the use and relative importance of the different modalities in the synthesized output. The results show that the model successfully leverages the input from the interlocutor to generate more appropriate behavior. Videos, data, and code are available at: https://jonepatr.github.io/lets_face_it/

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2020
Nyckelord
non-verbal behavior, machine learning, facial expressions, adaptive agents
Nationell ämneskategori
Människa-datorinteraktion (interaktionsdesign)
Identifikatorer
urn:nbn:se:kth:diva-290561 (URN)10.1145/3383652.3423911 (DOI)000728153600051 ()2-s2.0-85096990068 (Scopus ID)
Konferens
IVA '20: ACM International Conference on Intelligent Virtual Agents, Virtual Event, Scotland, UK, October 20-22, 2020
Forskningsfinansiär
Stiftelsen för strategisk forskning (SSF), RIT15-0107Wallenberg AI, Autonomous Systems and Software Program (WASP)
Anmärkning

QC 20210222

Tillgänglig från: 2021-02-18 Skapad: 2021-02-18 Senast uppdaterad: 2022-09-23Bibliografiskt granskad
6. Mechanical Chameleons: Evaluating the effects of a social robot’snon-verbal behavior on social influence
Öppna denna publikation i ny flik eller fönster >>Mechanical Chameleons: Evaluating the effects of a social robot’snon-verbal behavior on social influence
Visa övriga...
2021 (Engelska)Ingår i: Proceedings of SCRITA 2021, a workshop at IEEE RO-MAN 2021, 2021Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In this paper we present a pilot study which investigates how non-verbal behavior affects social influence in social robots. We also present a modular system which is capable of controlling the non-verbal behavior based on the interlocutor's facial gestures (head movements and facial expressions) in real time, and a study investigating whether three different strategies for facial gestures ("still", "natural movement", i.e. movements recorded from another conversation, and "copy", i.e. mimicking the user with a four second delay) has any affect on social influence and decision making in a "survival task". Our preliminary results show there was no significant difference between the three conditions, but this might be due to among other things a low number of study participants (12). 

Nationell ämneskategori
Datorsystem
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-309464 (URN)
Konferens
Trust, Acceptance and Social Cues in Human-Robot Interaction - SCRITA, 12 August, 2021
Forskningsfinansiär
Stiftelsen för strategisk forskning (SSF), RIT15-0107Vetenskapsrådet, 2018-05409
Anmärkning

QC 20220308

Tillgänglig från: 2022-03-03 Skapad: 2022-03-03 Senast uppdaterad: 2022-06-25Bibliografiskt granskad
7. HEMVIP: Human Evaluation of Multiple Videos in Parallel
Öppna denna publikation i ny flik eller fönster >>HEMVIP: Human Evaluation of Multiple Videos in Parallel
Visa övriga...
2021 (Engelska)Ingår i: ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction, New York, NY, United States: Association for Computing Machinery (ACM) , 2021, s. 707-711Konferensbidrag, Poster (med eller utan abstract) (Refereegranskat)
Abstract [en]

In many research areas, for example motion and gesture generation, objective measures alone do not provide an accurate impression of key stimulus traits such as perceived quality or appropriateness. The gold standard is instead to evaluate these aspects through user studies, especially subjective evaluations of video stimuli. Common evaluation paradigms either present individual stimuli to be scored on Likert-type scales, or ask users to compare and rate videos in a pairwise fashion. However, the time and resources required for such evaluations scale poorly as the number of conditions to be compared increases. Building on standards used for evaluating the quality of multimedia codecs, this paper instead introduces a framework for granular rating of multiple comparable videos in parallel. This methodology essentially analyses all condition pairs at once. Our contributions are 1) a proposed framework, called HEMVIP, for parallel and granular evaluation of multiple video stimuli and 2) a validation study confirming that results obtained using the tool are in close agreement with results of prior studies using conventional multiple pairwise comparisons.

Ort, förlag, år, upplaga, sidor
New York, NY, United States: Association for Computing Machinery (ACM), 2021
Nyckelord
evaluation paradigms, video evaluation, conversational agents, gesture generation
Nationell ämneskategori
Datorsystem
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-309462 (URN)10.1145/3462244.3479957 (DOI)2-s2.0-85113672097 (Scopus ID)
Konferens
International Conference on Multimodal Interaction Montreal, Canada. October 18-22nd, 2021
Forskningsfinansiär
Stiftelsen för strategisk forskning (SSF), RIT15-0107Wallenberg AI, Autonomous Systems and Software Program (WASP)
Anmärkning

Part of proceedings: ISBN 978-1-4503-8481-0

QC 20220309

Tillgänglig från: 2022-03-03 Skapad: 2022-03-03 Senast uppdaterad: 2023-01-18Bibliografiskt granskad
8. Can we trust online crowdworkers? : Comparing online and offline participants in a preference test of virtual agents.
Öppna denna publikation i ny flik eller fönster >>Can we trust online crowdworkers? : Comparing online and offline participants in a preference test of virtual agents.
2020 (Engelska)Ingår i: IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, Association for Computing Machinery (ACM) , 2020Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Conducting user studies is a crucial component in many scientific fields. While some studies require participants to be physically present, other studies can be conducted both physically (e.g. in-lab)and online (e.g. via crowdsourcing). Inviting participants to the lab can be a time-consuming and logistically difficult endeavor, not to mention that sometimes research groups might not be able to run in-lab experiments, because of, for example, a pandemic. Crowd-sourcing platforms such as Amazon Mechanical Turk (AMT) or prolific can therefore be a suitable alternative to run certain experiments, such as evaluating virtual agents. Although previous studies investigated the use of crowdsourcing platforms for running experiments, there is still uncertainty as to whether the results are reliable for perceptual studies. Here we replicate a previous experiment where participants evaluated a gesture generation model for virtual agents. The experiment is conducted across three participant poolsś in-lab, Prolific, andAMTś having similar demographics across the in-lab participants and the Prolific platform. Our results show no difference between the three participant pools in regards to their evaluations of the gesture generation models and their reliability scores. The results indicate that online platforms can successfully be used for perceptual evaluations of this kind.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2020
Nyckelord
user studies, online participants, attentiveness
Nationell ämneskategori
Människa-datorinteraktion (interaktionsdesign)
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-290562 (URN)10.1145/3383652.3423860 (DOI)000728153600002 ()2-s2.0-85096979963 (Scopus ID)
Konferens
IVA '20: ACM International Conference on Intelligent Virtual Agents, Virtual Event, Scotland, UK, October 20-22, 2020
Forskningsfinansiär
Stiftelsen för strategisk forskning (SSF), RIT15-0107Wallenberg AI, Autonomous Systems and Software Program (WASP), CorSA
Anmärkning

OQ 20211109

Part of Proceedings: ISBN 978-145037586-3

Taras Kucherenko and Patrik Jonell contributed equally to this research.

Tillgänglig från: 2021-02-18 Skapad: 2021-02-18 Senast uppdaterad: 2022-06-25Bibliografiskt granskad

Open Access i DiVA

fulltext(10344 kB)1661 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 10344 kBChecksumma SHA-512
91f3d8a856e2fe4fe88e428a01127cd386ebac433f1dc333ce6ba89ce4feefeda7801b3d83bddfccb1a36a4e5a3a61d4b15cf293befe695af04ac1460129a3a7
Typ fulltextMimetyp application/pdf

Person

Jonell, Patrik

Sök vidare i DiVA

Av författaren/redaktören
Jonell, Patrik
Av organisationen
Tal, musik och hörsel, TMH
Datorsystem

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 1662 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 1244 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf