kth.sePublications
Change search
Link to record
Permanent link

Direct link
Águas Lopes, José DavidORCID iD iconorcid.org/0000-0002-8773-9216
Alternative names
Publications (10 of 24) Show all publications
Engwall, O. & David Lopes, J. (2022). Interaction and collaboration in robot-assisted language learning for adults. Computer Assisted Language Learning, 35(5-6), 1273-1309
Open this publication in new window or tab >>Interaction and collaboration in robot-assisted language learning for adults
2022 (English)In: Computer Assisted Language Learning, ISSN 0958-8221, E-ISSN 1744-3210, Vol. 35, no 5-6, p. 1273-1309Article in journal (Refereed) Published
Abstract [en]

This article analyses how robot–learner interaction in robot-assisted language learning (RALL) is influenced by the interaction behaviour of the robot. Since the robot behaviour is to a large extent determined by the combination of teaching strategy, robot role and robot type, previous studies in RALL are first summarised with respect to which combinations that have been chosen, the rationale behind the choice and the effects on interaction and learning. The goal of the summary is to determine a suitable pedagogical set-up for RALL with adult learners, since previous RALL studies have almost exclusively been performed with children and youths. A user study in which 33 adult second language learners practice Swedish in three-party conversations with an anthropomorphic robot head is then presented. It is demonstrated how different robot interaction behaviours influence interaction between the robot and the learners and between the two learners. Through an analysis of learner interaction, collaboration and learner ratings for the different robot behaviours, it is observed that the learners were most positive towards the robot behaviour that focused on interviewing one learner at the time (highest average ratings), but that they were the most active in sessions when the robot encouraged learner–learner interaction. Moreover, the preferences and activity differed between learner pairs, depending on, e.g., their proficiency level and how well they knew the peer. It is therefore concluded that the robot behaviour needs to adapt to such factors. In addition, collaboration with the peer played an important part in conversation practice sessions to deal with linguistic difficulties or communication problems with the robot.

Place, publisher, year, edition, pages
Routledge, 2022
Keywords
Collaborative language learning; communicative language teaching; educational robots; human–robot interaction; spoken practice
National Category
Robotics
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-278975 (URN)10.1080/09588221.2020.1799821 (DOI)000557599700001 ()2-s2.0-85089256652 (Scopus ID)
Projects
Collaborative Robot-assisted Language Learning
Funder
Swedish Research Council, 2016-03698
Note

QC 20200818

Available from: 2020-08-08 Created: 2020-08-08 Last updated: 2023-10-02Bibliographically approved
Engwall, O., Águas Lopes, J. D. & Cumbal, R. (2022). Is a Wizard-of-Oz Required for Robot-Led Conversation Practice in a Second Language?. International Journal of Social Robotics
Open this publication in new window or tab >>Is a Wizard-of-Oz Required for Robot-Led Conversation Practice in a Second Language?
2022 (English)In: International Journal of Social Robotics, ISSN 1875-4791, E-ISSN 1875-4805Article in journal (Refereed) Published
Abstract [en]

The large majority of previous work on human-robot conversations in a second language has been performed with a human wizard-of-Oz. The reasons are that automatic speech recognition of non-native conversational speech is considered to be unreliable and that the dialogue management task of selecting robot utterances that are adequate at a given turn is complex in social conversations. This study therefore investigates if robot-led conversation practice in a second language with pairs of adult learners could potentially be managed by an autonomous robot. We first investigate how correct and understandable transcriptions of second language learner utterances are when made by a state-of-the-art speech recogniser. We find both a relatively high word error rate (41%) and that a substantial share (42%) of the utterances are judged to be incomprehensible or only partially understandable by a human reader. We then evaluate how adequate the robot utterance selection is, when performed manually based on the speech recognition transcriptions or autonomously using (a) predefined sequences of robot utterances, (b) a general state-of-the-art language model that selects utterances based on learner input or the preceding robot utterance, or (c) a custom-made statistical method that is trained on observations of the wizard’s choices in previous conversations. It is shown that adequate or at least acceptable robot utterances are selected by the human wizard in most cases (96%), even though the ASR transcriptions have a high word error rate. Further, the custom-made statistical method performs as well as manual selection of robot utterances based on ASR transcriptions. It was also found that the interaction strategy that the robot employed, which differed regarding how much the robot maintained the initiative in the conversation and if the focus of the conversation was on the robot or the learners, had marginal effects on the word error rate and understandability of the transcriptions but larger effects on the adequateness of the utterance selection. Autonomous robot-led conversations may hence work better with some robot interaction strategies.

Place, publisher, year, edition, pages
Springer Nature, 2022
Keywords
Robot-assisted language learning, Conversational practice, Non-native speech recognition, Dialogue management for spoken human-robot interaction
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-306942 (URN)10.1007/s12369-021-00849-8 (DOI)000739285100001 ()2-s2.0-85122404446 (Scopus ID)
Funder
Swedish Research Council, 2016-03698Marcus and Amalia Wallenberg Foundation, MAW 2020.0052
Note

QC 20220112

Available from: 2022-01-05 Created: 2022-01-05 Last updated: 2022-09-23Bibliographically approved
Gillet, S., Cumbal, R., Abelho Pereira, A. T., Lopes, J., Engwall, O. & Leite, I. (2021). Robot Gaze Can Mediate Participation Imbalance in Groups with Different Skill Levels. In: Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at ACM/IEEE International Conference on Human-Robot Interaction March 09 –11 (pp. 303-311). Association for Computing Machinery
Open this publication in new window or tab >>Robot Gaze Can Mediate Participation Imbalance in Groups with Different Skill Levels
Show others...
2021 (English)In: Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction, Association for Computing Machinery , 2021, p. 303-311Conference paper, Published paper (Refereed)
Abstract [en]

Many small group activities, like working teams or study groups, have a high dependency on the skill of each group member. Differences in skill level among participants can affect not only the performance of a team but also influence the social interaction of its members. In these circumstances, an active member could balance individual participation without exerting direct pressure on specific members by using indirect means of communication, such as gaze behaviors. Similarly, in this study, we evaluate whether a social robot can balance the level of participation in a language skill-dependent game, played by a native speaker and a second language learner. In a between-subjects study (N = 72), we compared an adaptive robot gaze behavior, that was targeted to increase the level of contribution of the least active player, with a non-adaptive gaze behavior. Our results imply that, while overall levels of speech participation were influenced predominantly by personal traits of the participants, the robot’s adaptive gaze behavior could shape the interaction among participants which lead to more even participation during the game.

Place, publisher, year, edition, pages
Association for Computing Machinery, 2021
Series
HRI ’21
Keywords
language learning, gaze, multiparty interaction, group dynamics
National Category
Interaction Technologies
Identifiers
urn:nbn:se:kth:diva-292043 (URN)10.1145/3434073.3444670 (DOI)001051690500035 ()2-s2.0-85102757966 (Scopus ID)
Conference
ACM/IEEE International Conference on Human-Robot Interaction March 09 –11
Funder
Swedish Foundation for Strategic Research , FFL18-0199Swedish Research Council, 2017-05189Swedish Research Council, 2016-03698
Note

QC 20210710

Available from: 2021-03-24 Created: 2021-03-24 Last updated: 2024-03-18Bibliographically approved
Cumbal, R., Moell, B., Águas Lopes, J. D. & Engwall, O. (2021). “You don’t understand me!”: Comparing ASR Results for L1 and L2 Speakers of Swedish. In: Proceedings Interspeech 2021: . Paper presented at 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021, Brno, 30 August 2021, through 3 September 2021 (pp. 96-100). International Speech Communication Association
Open this publication in new window or tab >>“You don’t understand me!”: Comparing ASR Results for L1 and L2 Speakers of Swedish
2021 (English)In: Proceedings Interspeech 2021, International Speech Communication Association , 2021, p. 96-100Conference paper, Published paper (Refereed)
Abstract [en]

The performance of Automatic Speech Recognition (ASR)systems has constantly increased in state-of-the-art develop-ment. However, performance tends to decrease considerably inmore challenging conditions (e.g., background noise, multiplespeaker social conversations) and with more atypical speakers(e.g., children, non-native speakers or people with speech dis-orders), which signifies that general improvements do not nec-essarily transfer to applications that rely on ASR, e.g., educa-tional software for younger students or language learners. Inthis study, we focus on the gap in performance between recog-nition results for native and non-native, read and spontaneous,Swedish utterances transcribed by different ASR services. Wecompare the recognition results using Word Error Rate and an-alyze the linguistic factors that may generate the observed tran-scription errors.

Place, publisher, year, edition, pages
International Speech Communication Association, 2021
Series
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X
Keywords
automatic speech recognition, non-native speech, language learning
National Category
Interaction Technologies
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-313355 (URN)10.21437/Interspeech.2021-2140 (DOI)000841879504109 ()2-s2.0-85119499427 (Scopus ID)
Conference
22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021, Brno, 30 August 2021, through 3 September 2021
Projects
Collaborative Robot Assisted Language Learning
Note

QC 20221108

Part of proceedings: ISBN 978-171383690-2

Available from: 2022-06-02 Created: 2022-06-02 Last updated: 2024-02-26Bibliographically approved
Cumbal, R., David Lopes, J. & Engwall, O. (2020). Detection of Listener Uncertainty in Robot-Led Second Language Conversation Practice. In: Proceedings ICMI '20: International Conference on Multimodal Interaction: . Paper presented at ICMI '20: International Conference on Multimodal Interaction, Virtual Event, The Netherlands, October 25-29, 2020. Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Detection of Listener Uncertainty in Robot-Led Second Language Conversation Practice
2020 (English)In: Proceedings ICMI '20: International Conference on Multimodal Interaction, Association for Computing Machinery (ACM) , 2020Conference paper, Published paper (Refereed)
Abstract [en]

Uncertainty is a frequently occurring affective state that learners ex-perience during the acquisition of a second language. This state canconstitute both a learning opportunity and a source of learner frus-tration. An appropriate detection could therefore benefit the learn-ing process by reducing cognitive instability. In this study, we usea dyadic practice conversation between an adult second-languagelearner and a social robot to elicit events of uncertainty throughthe manipulation of the robot’s spoken utterances (increased lex-ical complexity or prosody modifications). The characteristics ofthese events are then used to analyze multi-party practice conver-sations between a robot and two learners. Classification models aretrained with multimodal features from annotated events of listener(un)certainty. We report the performance of our models on differentsettings, (sub)turn segments and multimodal inputs.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2020
Keywords
Robot assisted language learning, conversation, social robotics
National Category
Computer Vision and Robotics (Autonomous Systems) Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-282657 (URN)10.1145/3382507.3418873 (DOI)2-s2.0-85096717756 (Scopus ID)
Conference
ICMI '20: International Conference on Multimodal Interaction, Virtual Event, The Netherlands, October 25-29, 2020
Projects
Collaborative Robot Assisted Language Learning
Funder
Swedish Research Council, 2016-03698
Note

QC 20200930

Available from: 2020-09-30 Created: 2020-09-30 Last updated: 2024-03-18Bibliographically approved
Engwall, O., David Lopes, J. & Åhlund, A. (2020). Robot interaction styles for conversation practice in second language learning. International Journal of Social Robotics
Open this publication in new window or tab >>Robot interaction styles for conversation practice in second language learning
2020 (English)In: International Journal of Social Robotics, ISSN 1875-4791, E-ISSN 1875-4805Article in journal (Refereed) Published
Abstract [en]

Four different interaction styles for the social robot Furhat acting as a host in spoken conversation practice with two simultaneous language learners have been developed, based on interaction styles of human moderators of language cafés.We first investigated, through a survey and recorded sessions of three-party language café style conversations, how the interaction styles of human moderators are influenced by different factors (e.g., the participants language level and familiarity).Using this knowledge, four distinct interaction styles were developed for the robot: sequentially asking one participant questions at the time (Interviewer); the robot speaking about itself, robots and Sweden or asking quiz questions about Sweden (Narrator); attempting to make the participants talk with each other (Facilitator); and trying to establish a three-party robot-learner-learner interaction with equal participation (Interlocutor).A user study with 32 participants, conversing in pairs with the robot, was carried out to investigate how the post-session ratings of the robot's behavior along different dimensions (e.g., the robot's conversational skills and friendliness, the value of practice) are influenced by the robot's interaction style and participant variables (e.g., level in the target language, gender, origin).The general findings were that Interviewer received the highest mean rating, but that different factors influenced the ratings substantially, indicating that the preference of individual participants needs to be anticipated in order to improve learner satisfaction with the practice. We conclude with a list of recommendations for robot-hosted conversation practice in a second language.

Place, publisher, year, edition, pages
Springer Nature, 2020
Keywords
Robot-assisted language learning; multi-party human-robot interaction; collaborative language learning; conversational practice
National Category
Interaction Technologies
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-273772 (URN)10.1007/s12369-020-00635-y (DOI)000562300900002 ()2-s2.0-85082549828 (Scopus ID)
Projects
Collaborative Robot-Assisted Language Learning
Funder
Swedish Research Council, 2016-03698
Note

QC 20200911

Available from: 2020-05-29 Created: 2020-05-29 Last updated: 2022-06-26Bibliographically approved
David Lopes, J., Hemmingsson, N. & Åstrand, O. (2019). The Spot the Difference corpus: A multi-modal corpus of spontaneous task oriented spoken interactions. In: LREC 2018 - 11th International Conference on Language Resources and Evaluation: . Paper presented at 11th International Conference on Language Resources and Evaluation, LREC 2018, 7-12 May 2018, Miyazaki, Japan (pp. 1939-1945). European Language Resources Association (ELRA)
Open this publication in new window or tab >>The Spot the Difference corpus: A multi-modal corpus of spontaneous task oriented spoken interactions
2019 (English)In: LREC 2018 - 11th International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA) , 2019, p. 1939-1945Conference paper, Published paper (Refereed)
Abstract [en]

This paper describes the Spot the Difference Corpus which contains 54 interactions between pairs of subjects interacting to find differences in two very similar scenes. The setup used, the participants' metadata and details about collection are described. We are releasing this corpus of task-oriented spontaneous dialogues. This release includes rich transcriptions, annotations, audio and video. We believe that this dataset constitutes a valuable resource to study several dimensions of human communication that go from turn-taking to the study of referring expressions. In our preliminary analyses we have looked at task success (how many differences were found out of the total number of differences) and how it evolves over time. In addition we have looked at scene complexity provided by the RGB components' entropy and how it could relate to speech overlaps, interruptions and the expression of uncertainty. We found there is a tendency that more complex scenes have more competitive interruptions.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA), 2019
Keywords
Dialogues, Multi-modal, Spontaneous, Expression of uncertainty, Human communications, Preliminary analysis, Referring expressions, Rich transcriptions
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-280730 (URN)000725545002004 ()2-s2.0-85059905206 (Scopus ID)
Conference
11th International Conference on Language Resources and Evaluation, LREC 2018, 7-12 May 2018, Miyazaki, Japan
Note

Part of ISBN 9791095546009

QC 20230922

Available from: 2020-09-11 Created: 2020-09-11 Last updated: 2023-09-22Bibliographically approved
Jonell, P., Bystedt, M., Fallgren, P., Kontogiorgos, D., David Aguas Lopes, J., Malisz, Z., . . . Shore, T. (2018). FARMI: A Framework for Recording Multi-Modal Interactions. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018): . Paper presented at The Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 7-12 May 2018 (pp. 3969-3974). Paris: European Language Resources Association
Open this publication in new window or tab >>FARMI: A Framework for Recording Multi-Modal Interactions
Show others...
2018 (English)In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris: European Language Resources Association, 2018, p. 3969-3974Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we present (1) a processing architecture used to collect multi-modal sensor data, both for corpora collection and real-time processing, (2) an open-source implementation thereof and (3) a use-case where we deploy the architecture in a multi-party deception game, featuring six human players and one robot. The architecture is agnostic to the choice of hardware (e.g. microphones, cameras, etc.) and programming languages, although our implementation is mostly written in Python. In our use-case, different methods of capturing verbal and non-verbal cues from the participants were used. These were processed in real-time and used to inform the robot about the participants’ deceptive behaviour. The framework is of particular interest for researchers who are interested in the collection of multi-party, richly recorded corpora and the design of conversational systems. Moreover for researchers who are interested in human-robot interaction the available modules offer the possibility to easily create both autonomous and wizard-of-Oz interactions.

Place, publisher, year, edition, pages
Paris: European Language Resources Association, 2018
National Category
Natural Sciences Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-230237 (URN)000725545004009 ()2-s2.0-85058179983 (Scopus ID)
Conference
The Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 7-12 May 2018
Note

Part of proceedings ISBN 979-10-95546-00-9

QC 20180618

Available from: 2018-06-13 Created: 2018-06-13 Last updated: 2022-09-22Bibliographically approved
Lopes, J., Engwall, O. & Skantze, G. (2017). A First Visit to the Robot Language Café. In: Engwall, Lopes (Ed.), Proceedings of the ISCA workshop on Speech and Language Technology in Education: . Paper presented at ISCA workshop on Speech and Language Technology in Education. Stockholm
Open this publication in new window or tab >>A First Visit to the Robot Language Café
2017 (English)In: Proceedings of the ISCA workshop on Speech and Language Technology in Education / [ed] Engwall, Lopes, Stockholm, 2017Conference paper, Published paper (Refereed)
Abstract [en]

We present an exploratory study on using a social robot in a conversational setting to practice a second language. The prac- tice is carried out within a so called language cafe ́, with two second language learners and one native moderator; a human or a robot; engaging in social small talk. We compare the in- teractions with the human and robot moderators and perform a qualitative analysis of the potentials of a social robot as a con- versational partner for language learning. Interactions with the robot are carried out in a wizard-of-Oz setting, in which the native moderator who leads the corresponding human moder- ator session controls the robot. The observations of the video recorded sessions and the subject questionnaires suggest that the appropriate learner level for the practice is elementary (A1 to A21), for whom the structured, slightly repetitive interaction pattern was perceived as beneficial. We identify both some key features that are appreciated by the learners and technological parts that need further development. 

Place, publisher, year, edition, pages
Stockholm: , 2017
Keywords
Robot-assisted language learning
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-218769 (URN)10.21437/SLaTE.2017-2 (DOI)2-s2.0-85089258588 (Scopus ID)
Conference
ISCA workshop on Speech and Language Technology in Education
Funder
Swedish Research Council, 2016-06013
Note

QC 20180111

Available from: 2017-11-30 Created: 2017-11-30 Last updated: 2024-03-15Bibliographically approved
Ribeiro, E., Batista, F., Trancoso, I., Lopes, J., Ribeiro, R. & De Matos, D. M. (2016). Assessing user expertise in spoken dialog system interactions. In: 3rd International Conference on Advances in Speech and Language Technologies for Iberian Languages, IberSPEECH 2016: . Paper presented at IberSPEECH 2016, 23 November 2016 through 25 November 2016 (pp. 245-254). Springer Publishing Company
Open this publication in new window or tab >>Assessing user expertise in spoken dialog system interactions
Show others...
2016 (English)In: 3rd International Conference on Advances in Speech and Language Technologies for Iberian Languages, IberSPEECH 2016, Springer Publishing Company, 2016, p. 245-254Conference paper, Published paper (Refereed)
Abstract [en]

Identifying the level of expertise of its users is important for a system since it can lead to a better interaction through adaptation techniques. Furthermore, this information can be used in offline processes of root cause analysis. However, not much effort has been put into automatically identifying the level of expertise of an user, especially in dialog-based interactions. In this paper we present an approach based on a specific set of task related features. Based on the distribution of the features among the two classes – Novice and Expert – we used Random Forests as a classification approach. Furthermore, we used a Support Vector Machine classifier, in order to perform a result comparison. By applying these approaches on data from a real system, Let’s Go, we obtained preliminary results that we consider positive, given the difficulty of the task and the lack of competing approaches for comparison.

Place, publisher, year, edition, pages
Springer Publishing Company, 2016
Keywords
Let’s Go, Random forest, SVM, User expertise, Decision trees, Adaptation techniques, Classification approach, Random forests, Result comparison, Root cause analysis, Spoken dialog systems, Support vector machine classifiers, Support vector machines
National Category
Other Engineering and Technologies
Identifiers
urn:nbn:se:kth:diva-201814 (URN)10.1007/978-3-319-49169-1_24 (DOI)000389797600024 ()2-s2.0-84997241411 (Scopus ID)9783319491684 (ISBN)
Conference
IberSPEECH 2016, 23 November 2016 through 25 November 2016
Note

Funding text: This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013, by Universidade de Lisboa, and by the EC H2020 project RAGE under grant agreement No 644187

QC 20170217

Available from: 2017-02-17 Created: 2017-02-17 Last updated: 2024-03-18Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-8773-9216

Search in DiVA

Show all publications