kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Is a Wizard-of-Oz Required for Robot-Led Conversation Practice in a Second Language?
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0003-4532-014X
Heriot-Watt University, Edinburgh, UK.ORCID iD: 0000-0002-8773-9216
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-4472-4732
2022 (English)In: International Journal of Social Robotics, ISSN 1875-4791, E-ISSN 1875-4805, Vol. 14, no 4, p. 1067-1085Article in journal (Refereed) Published
Abstract [en]

The large majority of previous work on human-robot conversations in a second language has been performed with a human wizard-of-Oz. The reasons are that automatic speech recognition of non-native conversational speech is considered to be unreliable and that the dialogue management task of selecting robot utterances that are adequate at a given turn is complex in social conversations. This study therefore investigates if robot-led conversation practice in a second language with pairs of adult learners could potentially be managed by an autonomous robot. We first investigate how correct and understandable transcriptions of second language learner utterances are when made by a state-of-the-art speech recogniser. We find both a relatively high word error rate (41%) and that a substantial share (42%) of the utterances are judged to be incomprehensible or only partially understandable by a human reader. We then evaluate how adequate the robot utterance selection is, when performed manually based on the speech recognition transcriptions or autonomously using (a) predefined sequences of robot utterances, (b) a general state-of-the-art language model that selects utterances based on learner input or the preceding robot utterance, or (c) a custom-made statistical method that is trained on observations of the wizard’s choices in previous conversations. It is shown that adequate or at least acceptable robot utterances are selected by the human wizard in most cases (96%), even though the ASR transcriptions have a high word error rate. Further, the custom-made statistical method performs as well as manual selection of robot utterances based on ASR transcriptions. It was also found that the interaction strategy that the robot employed, which differed regarding how much the robot maintained the initiative in the conversation and if the focus of the conversation was on the robot or the learners, had marginal effects on the word error rate and understandability of the transcriptions but larger effects on the adequateness of the utterance selection. Autonomous robot-led conversations may hence work better with some robot interaction strategies.

Place, publisher, year, edition, pages
Springer Nature , 2022. Vol. 14, no 4, p. 1067-1085
Keywords [en]
Robot-assisted language learning, Conversational practice, Non-native speech recognition, Dialogue management for spoken human-robot interaction
National Category
Natural Language Processing
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-306942DOI: 10.1007/s12369-021-00849-8ISI: 000739285100001Scopus ID: 2-s2.0-85122404446OAI: oai:DiVA.org:kth-306942DiVA, id: diva2:1624964
Funder
Swedish Research Council, 2016-03698Marcus and Amalia Wallenberg Foundation, MAW 2020.0052
Note

QC 20250612

Available from: 2022-01-05 Created: 2022-01-05 Last updated: 2025-06-12Bibliographically approved

Open Access in DiVA

fulltext(1044 kB)372 downloads
File information
File name FULLTEXT01.pdfFile size 1044 kBChecksum SHA-512
34acfd5fb0ae39bcbb526cab565773c29d2fabe68f19fc0488d3f57b79a982c87926b77019b2c82dd80f8632edefd272a3fe7a41af5265d1fa6b45d1a2f81df0
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Engwall, OlovÁguas Lopes, José DavidCumbal, Ronald

Search in DiVA

By author/editor
Engwall, OlovÁguas Lopes, José DavidCumbal, Ronald
By organisation
Speech Communication and TechnologySpeech, Music and Hearing, TMH
In the same journal
International Journal of Social Robotics
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 372 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 405 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf