kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Robots Beyond Borders: The Role of Social Robots in Spoken Second Language Practice
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-4472-4732
2024 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Robotar bortom gränser : Sociala robotars roll i talat andraspråk (Swedish)
Abstract [en]

This thesis investigates how social robots can support adult second language (L2) learners in improving conversational skills. It recognizes the challenges inherent in adult L2 learning, including increased cognitive demands and the unique motivations driving adult education. While social robots hold potential for natural interactions and language education, research into conversational skill practice with adult learners remains underexplored. Thus, the thesis contributes to understanding these conversational dynamics, enhancing speaking practice, and examining cultural perspectives in this context.

To begin, this thesis investigates robot-led conversations with L2 learners, examining how learners respond to moments of uncertainty. The research reveals that when faced with uncertainty, learners frequently seek clarification, yet many remain unresponsive. As a result, effective strategies are required from robot conversational partners to address this challenge. These interactions are then used to evaluate the performance of off-the-shelf Automatic Speech Recognition (ASR) systems. The assessment highlights that speech recognition for L2 speakers is not as effective as for L1 speakers, with performance deteriorating for both groups during social conversations. Addressing these challenges is imperative for the successful integration of robots in conversational practice with L2 learners.

The thesis then explores the potential advantages of employing social robots in collaborative learning environments with multi-party interactions. It delves into strategies for improving speaking practice, including the use of non-verbal behaviors to encourage learners to speak. For instance, a robot's adaptive gazing behavior is used to effectively balance speaking contributions between L1 and L2 pairs of participants. Moreover, an adaptive use of encouraging backchannels significantly increases the speaking time of L2 learners.

Finally, the thesis highlights the importance of further research on cultural aspects in human-robot interactions. One study reveals distinct responses among various socio-cultural groups in interaction between L1 and L2 participants. For example, factors such as gender, age, extroversion, and familiarity with robots influence conversational engagement of L2 speakers. Additionally, another study investigates preconceptions related to the appearance and accents of nationality-encoded (virtual and physical) social robots. The results indicate that initial perceptions may lead to negative preconceptions, but that these perceptions diminish after actual interactions.

Despite technical limitations, social robots provide distinct benefits in supporting educational endeavors. This thesis emphasizes the potential of social robots as effective facilitators of spoken language practice for adult learners, advocating for continued exploration at the intersection of language education, human-robot interaction, and technology.

Abstract [sv]

Denna avhandling undersöker hur sociala robotar kan ge vuxna andraspråks\-inlärare stöd att förbättra sin konversationsförmåga på svenska. Andraspråks\-inlärning för vuxna, särskilt i migrationskontext, är mer komplext än för barn, bland annat på grund av att förutsättningarna för språkinlärning försämras med åren och att drivkrafterna ofta är andra. Sociala robotar har stor potential inom språkundervisning för att träna naturliga samtal, men fortfarande har lite forskning om hur robotar kan öva konversation med vuxna elever genomförts. Därför bidrar avhandlingen till att förstå samtal mellan andraspråksinlärare och robotar, förbättra dessa samtalsövningar och undersöka hur kulturella faktorer påverkar interaktionen.

Till att börja med undersöker avhandlingen hur andraspråkselever reagerar då de blir förbryllade eller osäkra i robotledda konversationsövningar. Resultaten visar att eleverna ofta försöker få roboten att ge förtydliganden när de är osäkra, men att de ibland helt enkelt inte svarar något alls, vilket innebär att roboten behöver kunna hantera sådana situationer. Konversationerna mellan andraspråksinlärare och en robot har även använts för att undersöka hur väl ledande system för taligenkänning kan tolka det adraspråkstalare säger. Det kan konstateras att systemen har väsentligt större svårigheter att känna igen andraspråkstalare än personer med svensk bakgrund, samt att de har utmananingar att tolka såväl svenska talare som andraspråkselever i friare sociala konversationer, vilket måste hanteras när robotar ska användas i samtalsövningar med andraspråkselever.

Avhandlingen undersöker sedan strategier för att uppmuntra andraspråks\-elever att prata mer och för att fördela ordet jämnare i trepartsövningar där två personer samtalar med roboten. Strategierna går ut på att modifiera hur roboten tittar på de två personerna eller ger icke-verbal återkoppling (hummanden) för att signalera förståelse och intresse för det eleverna säger.

Slutligen belyser avhandlingen vikten av ytterligare forskning om kulturella aspekter i interaktioner mellan människa och robot. En studie visar att faktorer som kön, ålder, tidigare erfarenhet av robotar och hur extrovert eleven är påverkar både hur mycket olika personer talar och hur de svarar på robotens försök att uppmuntra dem att tala mer genom icke-verbala signaler.

En andra studie undersöker om och hur förutfattade meningar relaterade till utseende och uttal påverkar hur människor uppfattar (virtuella och fysiska) sociala robotar som givits egenskaper (röst och ansikte) som kan kopplas till olika nationella bakgrunder. Resultaten visar att människors första intryck av en kulturellt färgad robot speglar förutfattade meningar, men att denna uppfattning inte alls får samma genomslag när personer faktiskt interagerat med roboten i ett realistiskt sammanhang.

En huvudsaklig slutsats i avhandlingen är att sociala robotar, trots att tekniska begränsningar finns kvar, har tydliga fördelar som kan utnyttjas inom utbildning. Specifikt betonar avhandlingen potentialen hos sociala robotar att leda samtalsövningar för vuxna andraspråkselever och förespråkar fortsatt forskning i skärningspunkten mellan språkundervisning, människa-robotinteraktion och teknik.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. , p. 91
Series
TRITA-EECS-AVL ; 2024:23
Keywords [en]
Conversations, gaze, backchannels, multi-party, accent, culture
Keywords [sv]
Samtal, blick, återkoppling, gruppdynamik, brytning, kultur
National Category
Robotics and automation Natural Language Processing
Research subject
Speech and Music Communication
Identifiers
URN: urn:nbn:se:kth:diva-343863ISBN: 978-91-8040-858-5 (print)OAI: oai:DiVA.org:kth-343863DiVA, id: diva2:1840732
Public defence
2024-03-22, https://kth-se.zoom.us/j/65591848998, F3, Lindstedtsvägen 26, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

QC 20240226

Available from: 2024-02-26 Created: 2024-02-26 Last updated: 2025-02-05Bibliographically approved
List of papers
1. Uncertainty in robot assisted second language conversation practice
Open this publication in new window or tab >>Uncertainty in robot assisted second language conversation practice
2020 (English)In: HRI '20: Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, Association for Computing Machinery (ACM) , 2020, p. 171-173Conference paper, Published paper (Refereed)
Abstract [en]

Moments of uncertainty are common for learners when practicing a second language. The appropriate management of these events could help avoid the development of frustration and benefit the learner's experience. Therefore, its detection is crucial in language practice conversations. In this study, an experimental conversation between an adult second language learner and a social robot is employed to visually characterize the learners' uncertainty. The robot's output is manipulated in prosody and lexical levels to provoke uncertainty during the conversation. These reactions are then processed to obtain Facial Action Units (AUs) and Gaze features. Preliminary results show distinctive behavioral patterns of uncertainty among the participants. Based on these results, a new annotation scheme is proposed, which will expand the data used to train sequential models to detect uncertainty. As future steps, the robotic conversational partner will use this information to adapt its behavior in dialogue generation and language complexity.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2020
Keywords
Affective states, Human-robot interaction, Robot assisted language learning, Uncertainty, Agricultural robots, Man machine systems, Annotation scheme, Behavioral patterns, Dialogue generations, Language complexity, Learner's experience, Second language, Second language learners, Sequential model, Social robots
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-277271 (URN)10.1145/3371382.3378306 (DOI)000643728500052 ()2-s2.0-85083245832 (Scopus ID)
Conference
15th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2020, Cambridge, UK, March 23-26, 2020
Note

QC 20230208

Available from: 2020-06-25 Created: 2020-06-25 Last updated: 2024-02-26Bibliographically approved
2. Detection of Listener Uncertainty in Robot-Led Second Language Conversation Practice
Open this publication in new window or tab >>Detection of Listener Uncertainty in Robot-Led Second Language Conversation Practice
2020 (English)In: Proceedings ICMI '20: International Conference on Multimodal Interaction, Association for Computing Machinery (ACM) , 2020Conference paper, Published paper (Refereed)
Abstract [en]

Uncertainty is a frequently occurring affective state that learners ex-perience during the acquisition of a second language. This state canconstitute both a learning opportunity and a source of learner frus-tration. An appropriate detection could therefore benefit the learn-ing process by reducing cognitive instability. In this study, we usea dyadic practice conversation between an adult second-languagelearner and a social robot to elicit events of uncertainty throughthe manipulation of the robot’s spoken utterances (increased lex-ical complexity or prosody modifications). The characteristics ofthese events are then used to analyze multi-party practice conver-sations between a robot and two learners. Classification models aretrained with multimodal features from annotated events of listener(un)certainty. We report the performance of our models on differentsettings, (sub)turn segments and multimodal inputs.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2020
Keywords
Robot assisted language learning, conversation, social robotics
National Category
Computer graphics and computer vision Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-282657 (URN)10.1145/3382507.3418873 (DOI)2-s2.0-85096717756 (Scopus ID)
Conference
ICMI '20: International Conference on Multimodal Interaction, Virtual Event, The Netherlands, October 25-29, 2020
Projects
Collaborative Robot Assisted Language Learning
Funder
Swedish Research Council, 2016-03698
Note

QC 20200930

Available from: 2020-09-30 Created: 2020-09-30 Last updated: 2025-02-01Bibliographically approved
3. “You don’t understand me!”: Comparing ASR Results for L1 and L2 Speakers of Swedish
Open this publication in new window or tab >>“You don’t understand me!”: Comparing ASR Results for L1 and L2 Speakers of Swedish
2021 (English)In: Proceedings Interspeech 2021, International Speech Communication Association , 2021, p. 96-100Conference paper, Published paper (Refereed)
Abstract [en]

The performance of Automatic Speech Recognition (ASR)systems has constantly increased in state-of-the-art develop-ment. However, performance tends to decrease considerably inmore challenging conditions (e.g., background noise, multiplespeaker social conversations) and with more atypical speakers(e.g., children, non-native speakers or people with speech dis-orders), which signifies that general improvements do not nec-essarily transfer to applications that rely on ASR, e.g., educa-tional software for younger students or language learners. Inthis study, we focus on the gap in performance between recog-nition results for native and non-native, read and spontaneous,Swedish utterances transcribed by different ASR services. Wecompare the recognition results using Word Error Rate and an-alyze the linguistic factors that may generate the observed tran-scription errors.

Place, publisher, year, edition, pages
International Speech Communication Association, 2021
Series
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X
Keywords
automatic speech recognition, non-native speech, language learning
National Category
Other Engineering and Technologies
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-313355 (URN)10.21437/Interspeech.2021-2140 (DOI)000841879504109 ()2-s2.0-85119499427 (Scopus ID)
Conference
22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021, Brno, 30 August 2021, through 3 September 2021
Projects
Collaborative Robot Assisted Language Learning
Note

QC 20221108

Part of proceedings: ISBN 978-171383690-2

Available from: 2022-06-02 Created: 2022-06-02 Last updated: 2025-02-18Bibliographically approved
4. Robot Gaze Can Mediate Participation Imbalance in Groups with Different Skill Levels
Open this publication in new window or tab >>Robot Gaze Can Mediate Participation Imbalance in Groups with Different Skill Levels
Show others...
2021 (English)In: Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction, Association for Computing Machinery , 2021, p. 303-311Conference paper, Published paper (Refereed)
Abstract [en]

Many small group activities, like working teams or study groups, have a high dependency on the skill of each group member. Differences in skill level among participants can affect not only the performance of a team but also influence the social interaction of its members. In these circumstances, an active member could balance individual participation without exerting direct pressure on specific members by using indirect means of communication, such as gaze behaviors. Similarly, in this study, we evaluate whether a social robot can balance the level of participation in a language skill-dependent game, played by a native speaker and a second language learner. In a between-subjects study (N = 72), we compared an adaptive robot gaze behavior, that was targeted to increase the level of contribution of the least active player, with a non-adaptive gaze behavior. Our results imply that, while overall levels of speech participation were influenced predominantly by personal traits of the participants, the robot’s adaptive gaze behavior could shape the interaction among participants which lead to more even participation during the game.

Place, publisher, year, edition, pages
Association for Computing Machinery, 2021
Series
HRI ’21
Keywords
language learning, gaze, multiparty interaction, group dynamics
National Category
Other Engineering and Technologies
Identifiers
urn:nbn:se:kth:diva-292043 (URN)10.1145/3434073.3444670 (DOI)001051690500035 ()2-s2.0-85102757966 (Scopus ID)
Conference
ACM/IEEE International Conference on Human-Robot Interaction March 09 –11
Funder
Swedish Foundation for Strategic Research , FFL18-0199Swedish Research Council, 2017-05189Swedish Research Council, 2016-03698
Note

QC 20210710

Available from: 2021-03-24 Created: 2021-03-24 Last updated: 2025-02-18Bibliographically approved
5. Shaping unbalanced multi-party interactions through adaptive robot backchannels
Open this publication in new window or tab >>Shaping unbalanced multi-party interactions through adaptive robot backchannels
2022 (English)In: IVA 2022 - Proceedings of the 22nd ACM International Conference on Intelligent Virtual Agents, Association for Computing Machinery, Inc , 2022Conference paper, Published paper (Refereed)
Abstract [en]

Non-verbal cues used in human communication have shown to be efficient in shaping speaking interactions. When applied to virtual agents or social robots, results imply that a similar effect is expected in dyad settings. In this study, we explore how encouraging, vocal and non-vocal, backchannels can stimulate speaking participation in a game-based multi-party interaction, where unbalanced contribution is expected. We design the study using a social robot, taking part in a language game with native speakers and language learners, to evaluate how an adaptive generation of backchannels, that targets the least speaking participant to encourage more speaking contribution, affects the group and individual participant's behavior. We report results from experiments with 30 subjects divided in pairs assigned to the adaptive (encouraging) and (neutral) control condition. Our results show that the speaking participation of the least active speaker increases significantly when the robot uses an adaptive backchanneling strategy. At the same time, the participation of the more active speaker slightly decreases, which causes the combined speaking time of both participants to be similar between the Control and Experimental conditions. The adaptive strategy further leads to a 50% decrease in the difference in speaker shares between the two participants (indicating a more balanced participation) compared to the Control condition. However, this distribution between speaker ratios is not significantly different from the Control.

Place, publisher, year, edition, pages
Association for Computing Machinery, Inc, 2022
Keywords
encouraging behavior, human-robot interaction, non-verbal behavior, second language learning, social robot, Social robots, Back channels, Condition, Humans-robot interactions, Language learning, Multiparty interaction, Non-verbal behaviours, Second language, Machine design
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-327284 (URN)10.1145/3514197.3549680 (DOI)2-s2.0-85138738227 (Scopus ID)
Conference
IVA 2022 - Proceedings of the 22nd ACM International Conference on Intelligent Virtual Agents
Note

QC 20230524

Available from: 2023-05-24 Created: 2023-05-24 Last updated: 2024-02-26Bibliographically approved
6. Socio-cultural perception of robot backchannels
Open this publication in new window or tab >>Socio-cultural perception of robot backchannels
2023 (English)In: Frontiers in Robotics and AI, E-ISSN 2296-9144, Vol. 10Article in journal (Refereed) Published
Abstract [en]

Introduction: Backchannels, i.e., short interjections by an interlocutor to indicate attention, understanding or agreement regarding utterances by another conversation participant, are fundamental in human-human interaction. Lack of backchannels or if they have unexpected timing or formulation may influence the conversation negatively, as misinterpretations regarding attention, understanding or agreement may occur. However, several studies over the years have shown that there may be cultural differences in how backchannels are provided and perceived and that these differences may affect intercultural conversations. Culturally aware robots must hence be endowed with the capability to detect and adapt to the way these conversational markers are used across different cultures. Traditionally, culture has been defined in terms of nationality, but this is more and more considered to be a stereotypic simplification. We therefore investigate several socio-cultural factors, such as the participants’ gender, age, first language, extroversion and familiarity with robots, that may be relevant for the perception of backchannels.

Methods: We first cover existing research on cultural influence on backchannel formulation and perception in human-human interaction and on backchannel implementation in Human-Robot Interaction. We then present an experiment on second language spoken practice, in which we investigate how backchannels from the social robot Furhat influence interaction (investigated through speaking time ratios and ethnomethodology and multimodal conversation analysis) and impression of the robot (measured by post-session ratings). The experiment, made in a triad word game setting, is focused on if activity-adaptive robot backchannels may redistribute the participants’ speaking time ratio, and/or if the participants’ assessment of the robot is influenced by the backchannel strategy. The goal is to explore how robot backchannels should be adapted to different language learners to encourage their participation while being perceived as socio-culturally appropriate.

Results: We find that a strategy that displays more backchannels towards a less active speaker may substantially decrease the difference in speaking time between the two speakers, that different socio-cultural groups respond differently to the robot’s backchannel strategy and that they also perceive the robot differently after the session.

Discussion: We conclude that the robot may need different backchanneling strategies towards speakers from different socio-cultural groups in order to encourage them to speak and have a positive perception of the robot.

 

Place, publisher, year, edition, pages
Frontiers Media S.A., 2023
Keywords
backchannels, multiparty HRI, robot-assisted language learning, spoken practice, cultural effects
National Category
Human Computer Interaction
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-323334 (URN)10.3389/frobt.2023.988042 (DOI)000935012900001 ()36777379 (PubMedID)2-s2.0-85147686864 (Scopus ID)
Funder
Marcus and Amalia Wallenberg Foundation, 2020.0052
Note

QC 20230130

Available from: 2023-01-26 Created: 2023-01-26 Last updated: 2024-02-26Bibliographically approved
7. Stereotypical nationality representations in HRI: perspectives from international young adults
Open this publication in new window or tab >>Stereotypical nationality representations in HRI: perspectives from international young adults
2023 (English)In: Frontiers in Robotics and AI, E-ISSN 2296-9144, Vol. 10, article id 1264614Article in journal (Refereed) Published
Abstract [en]

People often form immediate expectations about other people, or groups of people, based on visual appearance and characteristics of their voice and speech. These stereotypes, often inaccurate or overgeneralized, may translate to robots that carry human-like qualities. This study aims to explore if nationality-based preconceptions regarding appearance and accents can be found in people's perception of a virtual and a physical social robot. In an online survey with 80 subjects evaluating different first-language-influenced accents of English and nationality-influenced human-like faces for a virtual robot, we find that accents, in particular, lead to preconceptions on perceived competence and likeability that correspond to previous findings in social science research. In a physical interaction study with 74 participants, we then studied if the perception of competence and likeability is similar after interacting with a robot portraying one of four different nationality representations from the online survey. We find that preconceptions on national stereotypes that appeared in the online survey vanish or are overshadowed by factors related to general interaction quality. We do, however, find some effects of the robot's stereotypical alignment with the subject group, with Swedish subjects (the majority group in this study) rating the Swedish-accented robot as less competent than the international group, but, on the other hand, recalling more facts from the Swedish robot's presentation than the international group does. In an extension in which the physical robot was replaced by a virtual robot interacting in the same scenario online, we further found the same results that preconceptions are of less importance after actual interactions, hence demonstrating that the differences in the ratings of the robot between the online survey and the interaction is not due to the interaction medium. We hence conclude that attitudes towards stereotypical national representations in HRI have a weak effect, at least for the user group included in this study (primarily educated young students in an international setting).

Place, publisher, year, edition, pages
Frontiers Media SA, 2023
Keywords
accent, appearance, social robot, nationality, stereotype, impression, competence, likeability
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-341526 (URN)10.3389/frobt.2023.1264614 (DOI)001115613500001 ()38077460 (PubMedID)2-s2.0-85178920101 (Scopus ID)
Note

QC 20231222

Available from: 2023-12-22 Created: 2023-12-22 Last updated: 2024-02-26Bibliographically approved
8. Speaking Transparently: Social Robots in Educational Settings
Open this publication in new window or tab >>Speaking Transparently: Social Robots in Educational Settings
2024 (English)In: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI '24 Companion), March 11--14, 2024, Boulder, CO, USA, 2024Conference paper, Published paper (Refereed)
Abstract [en]

The recent surge in popularity of Large Language Models, known for their inherent opacity, has increased the interest in fostering transparency in technology designed for human interaction. This concern is equally prevalent in the development of Social Robots, particularly when these are designed to engage in critical areas of our society, such as education or healthcare. In this paper we propose an experiment to investigate how users can be made aware of the automated decision processes when interacting in a discussion with a social robot. Our main objective is to assess the effectiveness of verbal expressions in fostering transparency within groups of individuals as they engage with a robot. We describe the proposed interactive settings, system design, and our approach to enhance the transparency in a robot's decision-making process for multi-party interactions.

Keywords
Dialogue system, transparency, multi-party, conversation
National Category
Robotics and automation
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-343862 (URN)10.1145/3610978.3640717 (DOI)001255070800079 ()2-s2.0-85188090310 (Scopus ID)
Conference
International Conference on Human-Robot Interaction
Note

QC 20240328

Available from: 2024-02-26 Created: 2024-02-26 Last updated: 2025-02-09Bibliographically approved

Open Access in DiVA

Kappa(3009 kB)577 downloads
File information
File name FULLTEXT01.pdfFile size 3009 kBChecksum SHA-512
d0154f0b64909025b9e4edfceb4416c10fba8d5df7aacc44eb1022b24a0017afa28d803479c2ca65e7d20af5fda13395ac3e210951c6958115aae1bba385fa06
Type fulltextMimetype application/pdf

Authority records

Cumbal, Ronald

Search in DiVA

By author/editor
Cumbal, Ronald
By organisation
Speech, Music and Hearing, TMH
Robotics and automationNatural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 578 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1660 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf