kth.sePublications
Change search
Refine search result
1 - 25 of 25
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Cumbal, Ronald
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    David Lopes, José
    Herriot-watt University.
    Engwall, Olov
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Detection of Listener Uncertainty in Robot-Led Second Language Conversation Practice2020In: Proceedings ICMI '20: International Conference on Multimodal Interaction, Association for Computing Machinery (ACM) , 2020Conference paper (Refereed)
    Abstract [en]

    Uncertainty is a frequently occurring affective state that learners ex-perience during the acquisition of a second language. This state canconstitute both a learning opportunity and a source of learner frus-tration. An appropriate detection could therefore benefit the learn-ing process by reducing cognitive instability. In this study, we usea dyadic practice conversation between an adult second-languagelearner and a social robot to elicit events of uncertainty throughthe manipulation of the robot’s spoken utterances (increased lex-ical complexity or prosody modifications). The characteristics ofthese events are then used to analyze multi-party practice conver-sations between a robot and two learners. Classification models aretrained with multimodal features from annotated events of listener(un)certainty. We report the performance of our models on differentsettings, (sub)turn segments and multimodal inputs.

    Download full text (pdf)
    fulltext
  • 2.
    Cumbal, Ronald
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Moell, Birger
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Águas Lopes, José David
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Engwall, Olov
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH, Speech Communication and Technology.
    “You don’t understand me!”: Comparing ASR Results for L1 and L2 Speakers of Swedish2021In: Proceedings Interspeech 2021, International Speech Communication Association , 2021, p. 96-100Conference paper (Refereed)
    Abstract [en]

    The performance of Automatic Speech Recognition (ASR)systems has constantly increased in state-of-the-art develop-ment. However, performance tends to decrease considerably inmore challenging conditions (e.g., background noise, multiplespeaker social conversations) and with more atypical speakers(e.g., children, non-native speakers or people with speech dis-orders), which signifies that general improvements do not nec-essarily transfer to applications that rely on ASR, e.g., educa-tional software for younger students or language learners. Inthis study, we focus on the gap in performance between recog-nition results for native and non-native, read and spontaneous,Swedish utterances transcribed by different ASR services. Wecompare the recognition results using Word Error Rate and an-alyze the linguistic factors that may generate the observed tran-scription errors.

    Download full text (pdf)
    fulltext
  • 3.
    David Lopes, José
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Hemmingsson, Nils
    KTH.
    Åstrand, Oliver
    KTH.
    The Spot the Difference corpus: A multi-modal corpus of spontaneous task oriented spoken interactions2019In: LREC 2018 - 11th International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA) , 2019, p. 1939-1945Conference paper (Refereed)
    Abstract [en]

    This paper describes the Spot the Difference Corpus which contains 54 interactions between pairs of subjects interacting to find differences in two very similar scenes. The setup used, the participants' metadata and details about collection are described. We are releasing this corpus of task-oriented spontaneous dialogues. This release includes rich transcriptions, annotations, audio and video. We believe that this dataset constitutes a valuable resource to study several dimensions of human communication that go from turn-taking to the study of referring expressions. In our preliminary analyses we have looked at task success (how many differences were found out of the total number of differences) and how it evolves over time. In addition we have looked at scene complexity provided by the RGB components' entropy and how it could relate to speech overlaps, interruptions and the expression of uncertainty. We found there is a tendency that more complex scenes have more competitive interruptions.

  • 4.
    Engwall, Olov
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Cumbal, Ronald
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Águas Lopes, José David
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Ljung, Mikael
    Månsson, Linnea
    Identification of low-engaged learners in robot-led second language conversations with adultsManuscript (preprint) (Other academic)
    Abstract [en]

    The main aim of this study is to investigate if verbal, vocal and facial information can be used to identify low-engaged second language learners in robot-led conversation practice. The experiments were performed on voice recordings and video data from 50 conversations, in which a robotic head talks with pairs of adult language learners using four different interaction strategies with varying robot-learner focus and initiative. It was found that these robot interaction strategies influenced learner activity and engagement. The verbal analysis indicated that learners with low activity rated the robot significantly lower on two out of four scales related to social competence. The acoustic vocal and video-based facial analysis, based on manual annotations or machine learning classification, both showed that learners with low engagement rated the robot’s social competencies consistently, and in several cases significantly, lower, and in addition rated the learning effectiveness lower. The agreement between manual and automatic identification of low-engaged learners based on voice recordings or face videos was further found to be adequate for future use. These experiments constitute a first step towards enabling adaption to learners’ activity and engagement through within- and between-strategy changes of the robot’s interaction with learners. 

    Download full text (pdf)
    fulltext
  • 5.
    Engwall, Olov
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    David Lopes, José
    Interaction Lab, Heriot-Watt University, Edinburgh, UK.
    Interaction and collaboration in robot-assisted language learning for adults2022In: Computer Assisted Language Learning, ISSN 0958-8221, E-ISSN 1744-3210, Vol. 35, no 5-6, p. 1273-1309Article in journal (Refereed)
    Abstract [en]

    This article analyses how robot–learner interaction in robot-assisted language learning (RALL) is influenced by the interaction behaviour of the robot. Since the robot behaviour is to a large extent determined by the combination of teaching strategy, robot role and robot type, previous studies in RALL are first summarised with respect to which combinations that have been chosen, the rationale behind the choice and the effects on interaction and learning. The goal of the summary is to determine a suitable pedagogical set-up for RALL with adult learners, since previous RALL studies have almost exclusively been performed with children and youths. A user study in which 33 adult second language learners practice Swedish in three-party conversations with an anthropomorphic robot head is then presented. It is demonstrated how different robot interaction behaviours influence interaction between the robot and the learners and between the two learners. Through an analysis of learner interaction, collaboration and learner ratings for the different robot behaviours, it is observed that the learners were most positive towards the robot behaviour that focused on interviewing one learner at the time (highest average ratings), but that they were the most active in sessions when the robot encouraged learner–learner interaction. Moreover, the preferences and activity differed between learner pairs, depending on, e.g., their proficiency level and how well they knew the peer. It is therefore concluded that the robot behaviour needs to adapt to such factors. In addition, collaboration with the peer played an important part in conversation practice sessions to deal with linguistic difficulties or communication problems with the robot.

    Download full text (pdf)
    fulltext
  • 6.
    Engwall, Olov
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    David Lopes, José
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Åhlund, Anna
    Stockholms universitet.
    Robot interaction styles for conversation practice in second language learning2020In: International Journal of Social Robotics, ISSN 1875-4791, E-ISSN 1875-4805Article in journal (Refereed)
    Abstract [en]

    Four different interaction styles for the social robot Furhat acting as a host in spoken conversation practice with two simultaneous language learners have been developed, based on interaction styles of human moderators of language cafés.We first investigated, through a survey and recorded sessions of three-party language café style conversations, how the interaction styles of human moderators are influenced by different factors (e.g., the participants language level and familiarity).Using this knowledge, four distinct interaction styles were developed for the robot: sequentially asking one participant questions at the time (Interviewer); the robot speaking about itself, robots and Sweden or asking quiz questions about Sweden (Narrator); attempting to make the participants talk with each other (Facilitator); and trying to establish a three-party robot-learner-learner interaction with equal participation (Interlocutor).A user study with 32 participants, conversing in pairs with the robot, was carried out to investigate how the post-session ratings of the robot's behavior along different dimensions (e.g., the robot's conversational skills and friendliness, the value of practice) are influenced by the robot's interaction style and participant variables (e.g., level in the target language, gender, origin).The general findings were that Interviewer received the highest mean rating, but that different factors influenced the ratings substantially, indicating that the preference of individual participants needs to be anticipated in order to improve learner satisfaction with the practice. We conclude with a list of recommendations for robot-hosted conversation practice in a second language.

    Download full text (pdf)
    fulltext
  • 7.
    Engwall, Olov
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Águas Lopes, José David
    Cumbal, Ronald
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Is a Wizard-of-Oz Required for Robot-Led Conversation Practice in a Second Language?2022In: International Journal of Social Robotics, ISSN 1875-4791, E-ISSN 1875-4805Article in journal (Refereed)
    Abstract [en]

    The large majority of previous work on human-robot conversations in a second language has been performed with a human wizard-of-Oz. The reasons are that automatic speech recognition of non-native conversational speech is considered to be unreliable and that the dialogue management task of selecting robot utterances that are adequate at a given turn is complex in social conversations. This study therefore investigates if robot-led conversation practice in a second language with pairs of adult learners could potentially be managed by an autonomous robot. We first investigate how correct and understandable transcriptions of second language learner utterances are when made by a state-of-the-art speech recogniser. We find both a relatively high word error rate (41%) and that a substantial share (42%) of the utterances are judged to be incomprehensible or only partially understandable by a human reader. We then evaluate how adequate the robot utterance selection is, when performed manually based on the speech recognition transcriptions or autonomously using (a) predefined sequences of robot utterances, (b) a general state-of-the-art language model that selects utterances based on learner input or the preceding robot utterance, or (c) a custom-made statistical method that is trained on observations of the wizard’s choices in previous conversations. It is shown that adequate or at least acceptable robot utterances are selected by the human wizard in most cases (96%), even though the ASR transcriptions have a high word error rate. Further, the custom-made statistical method performs as well as manual selection of robot utterances based on ASR transcriptions. It was also found that the interaction strategy that the robot employed, which differed regarding how much the robot maintained the initiative in the conversation and if the focus of the conversation was on the robot or the learners, had marginal effects on the word error rate and understandability of the transcriptions but larger effects on the adequateness of the utterance selection. Autonomous robot-led conversations may hence work better with some robot interaction strategies.

    Download full text (pdf)
    fulltext
  • 8.
    Engwall, Olov
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Águas Lopes, José David
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Cumbal, Ronald
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Berndtson, Gustav
    Lindström, Ruben
    Ekman, Patrik
    Hartmanis, Eric
    Jin, Emelie
    Johnston, Ella
    Tahir, Gara
    Mekonnen, Michael
    Learner and teacher perspectives on robot-led L2 conversation practiceIn: Article in journal (Refereed)
    Abstract [en]

    This article focuses on designing and evaluating conversation practice in a second language (L2) with a robot that employs human spoken and non-verbal interaction strategies. Based on an analysis of previous work and semi-structured interviews with L2 learners and teachers, recommendations for robot-led conversation practice for adult learners at intermediate level are first defined, focused on language learning, on the social context, on the conversational structure and on verbal and visual aspects of the robot moderation. Guided by these recommendations, an experiment is set up, in which 12 pairs of L2 learners of Swedish interact with a robot in short social conversations. These robot-learner interactions are evaluated through post-session interviews with the learners, teachers’ ratings of the robot’s behaviour and analyses of the video-recorded conversations, resulting in a set of guidelines for robot-led conversation practice, in particular: 1) Societal and personal topics increase the practice’s meaningfulness for learners. 2) Strategies and methods for providing corrective feedback during conversation practice need to be explored further. 3) Learners should be encouraged to support each other if the robot has difficulties adapting to their linguistic level. 4) The robot should establish a social relationship, by contributing with its own story, remembering the participants’ input, and making use of non-verbal communication signals. 5) Improvements are required regarding naturalness and intelligibility of text-to-speech synthesis, in particular its speed, if it is to be used for conversations with L2 learners. 

    Download full text (pdf)
    fulltext
  • 9. Georgiladakis, S.
    et al.
    Athanasopoulou, G.
    Meena, Raveesh
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH. KTH, Tal-kommunikation.
    David Lopes, José
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Chorianopoulou, A.
    Palogiannidi, E.
    Iosif, E.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH. KTH, Tal-kommunikation.
    Potamianos, A.
    Root Cause Analysis of Miscommunication Hotspots in Spoken Dialogue Systems2016In: Interspeech 2016, San Francisco, CA, USA: International Speech Communication Association, 2016Conference paper (Refereed)
    Abstract [en]

    A major challenge in Spoken Dialogue Systems (SDS) is the detection of problematic communication (hotspots), as well as the classification of these hotspots into different types (root cause analysis). In this work, we focus on two classes of root cause, namely, erroneous speech recognition vs. other (e.g., dialogue strategy). Specifically, we propose an automatic algorithm for detecting hotspots and classifying root causes in two subsequent steps. Regarding hotspot detection, various lexico-semantic features are used for capturing repetition patterns along with affective features. Lexico-semantic and repetition features are also employed for root cause analysis. Both algorithms are evaluated with respect to the Let’s Go dataset (bus information system). In terms of classification unweighted average recall, performance of 80% and 70% is achieved for hotspot detection and root cause analysis, respectively.

  • 10.
    Gillet, Sarah
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.
    Cumbal, Ronald
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Abelho Pereira, André Tiago
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Lopes, José
    Heriot-Watt University.
    Engwall, Olov
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Leite, Iolanda
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.
    Robot Gaze Can Mediate Participation Imbalance in Groups with Different Skill Levels2021In: Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction, Association for Computing Machinery , 2021, p. 303-311Conference paper (Refereed)
    Abstract [en]

    Many small group activities, like working teams or study groups, have a high dependency on the skill of each group member. Differences in skill level among participants can affect not only the performance of a team but also influence the social interaction of its members. In these circumstances, an active member could balance individual participation without exerting direct pressure on specific members by using indirect means of communication, such as gaze behaviors. Similarly, in this study, we evaluate whether a social robot can balance the level of participation in a language skill-dependent game, played by a native speaker and a second language learner. In a between-subjects study (N = 72), we compared an adaptive robot gaze behavior, that was targeted to increase the level of contribution of the least active player, with a non-adaptive gaze behavior. Our results imply that, while overall levels of speech participation were influenced predominantly by personal traits of the participants, the robot’s adaptive gaze behavior could shape the interaction among participants which lead to more even participation during the game.

  • 11.
    Jonell, Patrik
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Bystedt, Mattias
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Fallgren, Per
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Kontogiorgos, Dimosthenis
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    David Aguas Lopes, José
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Malisz, Zofia
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Mascarenhas, Samuel
    GAIPS INESC-ID, Lisbon, Portugal.
    Oertel, Catharine
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Eran, Raveh
    Multimodal Computing and Interaction, Saarland University, Germany.
    Shore, Todd
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    FARMI: A Framework for Recording Multi-Modal Interactions2018In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris: European Language Resources Association, 2018, p. 3969-3974Conference paper (Refereed)
    Abstract [en]

    In this paper we present (1) a processing architecture used to collect multi-modal sensor data, both for corpora collection and real-time processing, (2) an open-source implementation thereof and (3) a use-case where we deploy the architecture in a multi-party deception game, featuring six human players and one robot. The architecture is agnostic to the choice of hardware (e.g. microphones, cameras, etc.) and programming languages, although our implementation is mostly written in Python. In our use-case, different methods of capturing verbal and non-verbal cues from the participants were used. These were processed in real-time and used to inform the robot about the participants’ deceptive behaviour. The framework is of particular interest for researchers who are interested in the collection of multi-party, richly recorded corpora and the design of conversational systems. Moreover for researchers who are interested in human-robot interaction the available modules offer the possibility to easily create both autonomous and wizard-of-Oz interactions.

    Download full text (pdf)
    fulltext
  • 12. Koutsombogera, Maria
    et al.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Bollepalli, Bajibabu
    Abdelaziz, Ahmed Hussen
    Johansson, Martin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Aguas Lopes, Jose David
    Novikova, Jekaterina
    Oertel, Catharine
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Varol, Gul
    The Tutorbot Corpus - A Corpus for Studying Tutoring Behaviour in Multiparty Face-to-Face Spoken Dialogue2014Conference paper (Refereed)
    Abstract [en]

    This paper describes a novel experimental setup exploiting state-of-the-art capture equipment to collect a multimodally rich game-solving collaborative multiparty dialogue corpus. The corpus is targeted and designed towards the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interactions. The dialogue task is centered on two participants involved in a dialogue aiming to solve a card-ordering game. The participants were paired into teams based on their degree of extraversion as resulted from a personality test. With the participants sits a tutor that helps them perform the task, organizes and balances their interaction and whose behavior was assessed by the participants after each interaction. Different multimodal signals captured and auto-synchronized by different audio-visual capture technologies, together with manual annotations of the tutor’s behavior constitute the Tutorbot corpus. This corpus is exploited to build a situated model of the interaction based on the participants’ temporally-changing state of attention, their conversational engagement and verbal dominance, and their correlation with the verbal and visual feedback and conversation regulatory actions generated by the tutor.

  • 13.
    Lopes, José
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Lexical Entainment in Spoken Dialog Systems2013Manuscript (preprint) (Other academic)
  • 14.
    Lopes, José
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Chorianopoulou, Arodami
    Palogiannidi, Elisavet
    Moniz, Helena
    Abad, Alberto
    Louka, Katerina
    Iosif, Elias
    Potamianos, Alexandros
    The spedial datasets: datasets for spoken dialogue systems analytics2016In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, 2016Conference paper (Refereed)
    Download full text (pdf)
    fulltext
  • 15.
    Lopes, José
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Engwall, Olov
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A First Visit to the Robot Language Café2017In: Proceedings of the ISCA workshop on Speech and Language Technology in Education / [ed] Engwall, Lopes, Stockholm, 2017Conference paper (Refereed)
    Abstract [en]

    We present an exploratory study on using a social robot in a conversational setting to practice a second language. The prac- tice is carried out within a so called language cafe ́, with two second language learners and one native moderator; a human or a robot; engaging in social small talk. We compare the in- teractions with the human and robot moderators and perform a qualitative analysis of the potentials of a social robot as a con- versational partner for language learning. Interactions with the robot are carried out in a wizard-of-Oz setting, in which the native moderator who leads the corresponding human moder- ator session controls the robot. The observations of the video recorded sessions and the subject questionnaires suggest that the appropriate learner level for the practice is elementary (A1 to A21), for whom the structured, slightly repetitive interaction pattern was perceived as beneficial. We identify both some key features that are appreciated by the learners and technological parts that need further development. 

    Download full text (pdf)
    fulltext
  • 16. Lopes, José
    et al.
    Eskenazi, Maxine
    Trancoso, Isabel
    Automated two-way entrainment to improve spoken dialog system performance2013Conference paper (Refereed)
    Abstract [en]

    This paper proposes an approach to the use of lexical entrainment in Spoken Dialog Systems. This approach aims to increase the dialog success rate by adapting the lexical choices of the system to the user's lexical choices. If the system finds that the users lexical choice degrades the performance, it will try to establish a new conceptual pact, proposing other words that the user may adopt, in order to be more successful in task completion. The approach was implemented and tested in two different systems. Tests showed a relative dialog estimated error rate reduction of 10% and a relative reduction in the average number of turns per session of 6%.

  • 17.
    Lopes, José
    et al.
    INESC-ID Lisboa, Portugal; Instituto Superior Técnico, Portugal.
    Eskenazi, Maxine
    CMU.
    Trancoso, Isabel
    INESC-ID Lisboa/Instituto Superior Técnico.
    From rule-based to data-driven lexical entrainment models in spoken dialog systems2015In: Computer speech & language (Print), ISSN 0885-2308, E-ISSN 1095-8363, Vol. 31, no 1, p. 87-112Article in journal (Refereed)
    Abstract [en]

    This paper presents uses a data-driven approach to improve Spoken Dialog System (SDS) performance by automatically finding the most appropriate terms to be used in system prompts. The literature shows that speakers use one another’s terms (entrain) when trying to create common ground during a spoken dialog. Those terms are commonly called “primes”, since they influence the interlocutors’ linguistic decision-making. This approach emulates human interaction, with a system built to propose primes to the user and accept the primes that the user proposes. These primes are chosen on the fly during the interaction, based on a set of features that indicate good candidate primes. A good candidate is one that we know is easily recognized by the speech recognizer, and is also a normal word choice given the context. The system is trained to follow the user’s choice of prime if system performance is not negatively affected. When system performance is affected, the system proposes a new prime. In our previous work we have shown how we can identify the prime candidates and how the system can select primes using rules. In this paper we go further, presenting a data-driven method to perform the same task. Live tests with this method show that use of on-the-fly entrainment reduces out-of-vocabulary and word error rate, and also increases the number of correctly transferred concepts.

  • 18. Lopes, José
    et al.
    Eskenazi, Maxine
    Trancoso, Isabel
    Incorporating ASR information in Spoken Dialog System confidence score2012In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10th International Conference on Computational Processing of Portuguese, PROPOR 2012, Springer, 2012, p. 403-408Conference paper (Refereed)
    Abstract [en]

    The reliability of the confidence score is very important in Spoken Dialog System performance. This paper describes a set of experiments with previously collected off-line data, regarding the set of features that should be used in the computation of the confidence score. Three different regression methods to weight the features were used and the results show that the incorporation of the confidence score given by the speech recognizer improves the confidence measure.

  • 19.
    Lopes, José
    et al.
    INESC-ID Lisboa, Portugal; Instituto Superior Tecnico, Lisboa, Portugal.
    Eskenazi, Maxine
    Trancoso, Isabel
    Towards choosing better primes for spoken dialog systems2011Conference paper (Refereed)
    Abstract [en]

    When humans and computers use the same terms(primes, when they entrain to one another), spoken dialogsproceed more smoothly. The goal of this paper is to describeinitial steps we have found that will enable us to eventuallyautomatically choose better primes in spoken dialog systemprompts. Two different sets of prompts were used to understandwhat makes one prime more suitable than another. The impactof the primes chosen in speech recognition was evaluated. Inaddition, results reveal that users did adopt the new vocabularyintroduced in the new system prompts. As a result of this,performance of the system improved, providing clues for thetrade off needed when choosing between adequate primes inprompts and speech recognition performance.

    Download full text (pdf)
    fulltext
  • 20.
    Lopes, José
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Salvi, Giampiero
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Abad, A.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Batista, F.
    Meena, Raveesh
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Trancoso, I.
    Detecting Repetitions in Spoken Dialogue Systems Using Phonetic Distances2015In: INTERSPEECH-2015, 2015, p. 1805-1809Conference paper (Refereed)
    Abstract [en]

    Repetitions in Spoken Dialogue Systems can be a symptom of problematic communication. Such repetitions are often due to speech recognition errors, which in turn makes it harder to use the output of the speech recognizer to detect repetitions. In this paper, we combine the alignment score obtained using phonetic distances with dialogue-related features to improve repetition detection. To evaluate the method proposed we compare several alignment techniques from edit distance to DTW-based distance, previously used in Spoken-Term detection tasks. We also compare two different methods to compute the phonetic distance: the first one using the phoneme sequence, and the second one using the distance between the phone posterior vectors. Two different datasets were used in this evaluation: a bus-schedule information system (in English) and a call routing system (in Swedish). The results show that approaches using phoneme distances over-perform approaches using Levenshtein distances between ASR outputs for repetition detection.

  • 21.
    Lopes, José
    et al.
    INESC-ID Lisboa, Portugal.
    Trancoso, Isabel
    INESC-ID Lisboa, Portugal.
    Abad, Alberto
    INESC-ID Lisboa, Portugal.
    A nativeness classifier for TED Talks2011Conference paper (Refereed)
    Abstract [en]

    This paper presents a nativeness classifier for English. The detector was developed and tested with TED Talks collected from the web, where the major non-native cues are in terms of segmental aspects and prosody. The first experiments were made using only acoustic features, with Gaussian supervectors for training a classifier based on support vector machines. These experiments resulted in an equal error rate of 13.11%. The following experiments based on prosodic features alone did not yield good results. However, a fused system, combining acoustic and prosodic cues, achieved an equal error rate of 10.58%. A small human benchmark was conducted, showing an inter-rater agreement of 0.88. This value is also very close to the agreement value between humans and the best fused system.

  • 22. Lopes, José
    et al.
    Trancoso, Isabel
    Correia, Rui
    Pellegrini, Thomas
    Meinedo, Hugo
    Mamede, Nuno
    Eskenazi, Maxine
    Multimedia learning materials2010Conference paper (Refereed)
    Abstract [en]

    This paper describes the integration of multimedia documents in the Portuguese version of REAP, a tutoring system for vocabulary learning. The documents result from the pipeline processing of Broadcast News videos that automatically segments the audio files, transcribes them, adds punctuation and capitalization, and breaks them into stories classified by topics. The integration of these materials in REAP was done in a way that tries to decrease the impact of potential errors of the automatic chain in the learning process.

  • 23. Marujo, Luís
    et al.
    Lopes, José
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Mamede, Nuno
    Trancoso, Isabel
    Pino, Juan
    Eskenazi, Maxine
    Baptista, Jorge
    Viana, Céu
    Porting REAP to European Portuguese2009Conference paper (Refereed)
  • 24.
    Meena, Raveesh
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    David Lopes, José
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Automatic Detection of Miscommunication in Spoken Dialogue Systems2015In: Proceedings of 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2015, p. 354-363Conference paper (Refereed)
    Abstract [en]

    In this paper, we present a data-driven approach for detecting instances of miscommunication in dialogue system interactions. A range of generic features that are both automatically extractable and manually annotated were used to train two models for online detection and one for offline analysis. Online detection could be used to raise the error awareness of the system, whereas offline detection could be used by a system designer to identify potential flaws in the dialogue design. In experimental evaluations on system logs from three different dialogue systems that vary in their dialogue strategy, the proposed models performed substantially better than the majority class baseline models.

  • 25. Ribeiro, E.
    et al.
    Batista, F.
    Trancoso, I.
    Lopes, José
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Ribeiro, R.
    De Matos, D. M.
    Assessing user expertise in spoken dialog system interactions2016In: 3rd International Conference on Advances in Speech and Language Technologies for Iberian Languages, IberSPEECH 2016, Springer Publishing Company, 2016, p. 245-254Conference paper (Refereed)
    Abstract [en]

    Identifying the level of expertise of its users is important for a system since it can lead to a better interaction through adaptation techniques. Furthermore, this information can be used in offline processes of root cause analysis. However, not much effort has been put into automatically identifying the level of expertise of an user, especially in dialog-based interactions. In this paper we present an approach based on a specific set of task related features. Based on the distribution of the features among the two classes – Novice and Expert – we used Random Forests as a classification approach. Furthermore, we used a Support Vector Machine classifier, in order to perform a result comparison. By applying these approaches on data from a real system, Let’s Go, we obtained preliminary results that we consider positive, given the difficulty of the task and the lack of competing approaches for comparison.

1 - 25 of 25
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf