kth.sePublications
Change search
Refine search result
1 - 18 of 18
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Csapo, A.
    et al.
    Gilmartin, E.
    Grizou, J.
    Han, J.
    Meena, Raveesh
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Anastasiou, D.
    Jokinen, K.
    Wilcock, G.
    Multimodal conversational interaction with a humanoid robot2012In: 3rd IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2012 - Proceedings, IEEE , 2012, p. 667-672Conference paper (Refereed)
    Abstract [en]

    The paper presents a multimodal conversational interaction system for the Nao humanoid robot. The system was developed at the 8th International Summer Workshop on Multimodal Interfaces, Metz, 2012. We implemented WikiTalk, an existing spoken dialogue system for open-domain conversations, on Nao. This greatly extended the robot's interaction capabilities by enabling Nao to talk about an unlimited range of topics. In addition to speech interaction, we developed a wide range of multimodal interactive behaviours by the robot, including face-tracking, nodding, communicative gesturing, proximity detection and tactile interrupts. We made video recordings of user interactions and used questionnaires to evaluate the system. We further extended the robot's capabilities by linking Nao with Kinect.

  • 2. Csapo, A.
    et al.
    Gilmartin, E.
    Grizou, J.
    Han, J.
    Meena, Raveesh
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Anastasiou, D.
    Jokinen, K.
    Wilcock, G.
    Open-Domain Conversation with a NAO Robot2012In: 3rd International Conference on Cognitive Infocommunications (CogInfoCom 2012), Kosice, 2012Conference paper (Refereed)
    Abstract [en]

    In this demo, we present a multimodal conversationsystem, implemented using a Nao robot and Wikipedia. The system was developed at the 8th International Workshop on Multimodal Interfaces in Metz, France, 2012. The system is based on an interactive, open-domain spoken dialogue systemcalled WikiTalk, which guides the user through conversations based on the link structure of Wikipedia. In addition to speech interaction, the robot interacts with users by tracking their faces and nodding/gesturing at key points of interest within the Wikipedia text. The proximity detection capabilities of the Nao,as well as its tactile sensors were used to implement context-based interrupts in the dialogue system.

  • 3. Georgiladakis, S.
    et al.
    Athanasopoulou, G.
    Meena, Raveesh
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH. KTH, Tal-kommunikation.
    David Lopes, José
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Chorianopoulou, A.
    Palogiannidi, E.
    Iosif, E.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH. KTH, Tal-kommunikation.
    Potamianos, A.
    Root Cause Analysis of Miscommunication Hotspots in Spoken Dialogue Systems2016In: Interspeech 2016, San Francisco, CA, USA: International Speech Communication Association, 2016Conference paper (Refereed)
    Abstract [en]

    A major challenge in Spoken Dialogue Systems (SDS) is the detection of problematic communication (hotspots), as well as the classification of these hotspots into different types (root cause analysis). In this work, we focus on two classes of root cause, namely, erroneous speech recognition vs. other (e.g., dialogue strategy). Specifically, we propose an automatic algorithm for detecting hotspots and classifying root causes in two subsequent steps. Regarding hotspot detection, various lexico-semantic features are used for capturing repetition patterns along with affective features. Lexico-semantic and repetition features are also employed for root cause analysis. Both algorithms are evaluated with respect to the Let’s Go dataset (bus information system). In terms of classification unweighted average recall, performance of 80% and 70% is achieved for hotspot detection and root cause analysis, respectively.

  • 4.
    Kruijff-Korbayová, Ivana
    et al.
    German Research Center for Aritificial Intelligence (DFKI GmbH), Saarbrücken, Germany.
    Meena, Raveesh
    German Research Center for Artificial Intelligence, Germany.
    Pyykkönen, Pirita
    Saarland University, Saarbrücken, Germany.
    Perception of Visual Scene and Intonation Patterns of Robot Utterances2011In: Proceedings of the 6th International Conference on Human-robot Interaction, New York: ACM Digital Library, 2011, p. 173-174Conference paper (Refereed)
    Abstract [en]

    Assigning intonation to dialogue system output in a way that reflects relationships between entities in the discourse context can enhance the acceptability of system utterances. Previous research concentrated on the role of linguistic context; dialogue situatedness and the role of visual context in determining accent placement has not been studied. We present an experimental study on the influence of visual context on the perception of nuclear accent placement in synthesized clarification requests. We found that utterances are perceived as appropriate more often when the visual scene licenses the nuclear accent placement than when it does not.

  • 5. Lison, P.
    et al.
    Meena, Raveesh
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Automatic Turn Segmentation for Movie & TV Subtitles2016In: 2016 IEEE Workshop on Spoken Language Technology (SLT 2016), IEEE conference proceedings, 2016, p. 245-252Conference paper (Refereed)
    Abstract [en]

    Movie and TV subtitles contain large amounts of conversational material, but lack an explicit turn structure. This paper present a data-driven approach to the segmentation of subtitles into dialogue turns. Training data is first extracted by aligning subtitles with transcripts in order to obtain speaker labels. This data is then used to build a classifier whose task is to determine whether two consecutive sentences are part of the same dialogue turn. The approach relies on linguistic, visual and timing features extracted from the subtitles themselves and does not require access to the audiovisual material -- although speaker diarization can be exploited when audio data is available. The approach also exploits alignments with related subtitles in other languages to further improve the classification performance. The classifier achieves an accuracy of 78% on a held-out test set. A follow-up annotation experiment demonstrates that this task is also difficult for human annotators.

    Download full text (pdf)
    fulltext
  • 6.
    Lopes, José
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Salvi, Giampiero
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Abad, A.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Batista, F.
    Meena, Raveesh
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Trancoso, I.
    Detecting Repetitions in Spoken Dialogue Systems Using Phonetic Distances2015In: INTERSPEECH-2015, 2015, p. 1805-1809Conference paper (Refereed)
    Abstract [en]

    Repetitions in Spoken Dialogue Systems can be a symptom of problematic communication. Such repetitions are often due to speech recognition errors, which in turn makes it harder to use the output of the speech recognizer to detect repetitions. In this paper, we combine the alignment score obtained using phonetic distances with dialogue-related features to improve repetition detection. To evaluate the method proposed we compare several alignment techniques from edit distance to DTW-based distance, previously used in Spoken-Term detection tasks. We also compare two different methods to compute the phonetic distance: the first one using the phoneme sequence, and the second one using the distance between the phone posterior vectors. Two different datasets were used in this evaluation: a bus-schedule information system (in English) and a call routing system (in Swedish). The results show that approaches using phoneme distances over-perform approaches using Levenshtein distances between ASR outputs for repetition detection.

  • 7.
    Meena, Raveesh
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Data-driven Methods for Spoken Dialogue Systems: Applications in Language Understanding, Turn-taking, Error Detection, and Knowledge Acquisition2016Doctoral thesis, monograph (Other academic)
    Abstract [en]

    Spoken dialogue systems are application interfaces that enable humans to interact with computers using spoken natural language. A major challenge for these systems is dealing with the ubiquity of variability—in user behavior, in the performance of the various speech and language processing sub-components, and in the dynamics of the task domain. However, as the predominant methodology for dialogue system development is to handcraft the sub-components, these systems typically lack robustness in user interactions. Data-driven methods, on the other hand, have been shown to offer robustness to variability in various domains of computer science and are increasingly being used in dialogue systems research.    

    This thesis makes four novel contributions to the data-driven methods for spoken dialogue system development. First, a method for interpreting the meaning contained in spoken utterances is presented. Second, an approach for determining when in a user’s speech it is appropriate for the system to give a response is presented. Third, an approach for error detection and analysis in dialogue system interactions is reported. Finally, an implicitly supervised learning approach for knowledge acquisition through the interactive setting of spoken dialogue is presented.     

    The general approach taken in this thesis is to model dialogue system tasks as a classification problem and investigate features (e.g., lexical, syntactic, semantic, prosodic, and contextual) to train various classifiers on interaction data. The central hypothesis of this thesis is that the models for the aforementioned dialogue system tasks trained using the features proposed here perform better than their corresponding baseline models. The empirical validity of this claim has been assessed through both quantitative and qualitative evaluations, using both objective and subjective measures.

    Download full text (pdf)
    Raveesh_PHD2016
  • 8.
    Meena, Raveesh
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Boye, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Crowdsourcing Street-level Geographic Information Using a Spoken Dialogue System2014In: Proceedings of the SIGDIAL 2014 Conference, Association for Computational Linguistics, 2014, p. 2-11Conference paper (Refereed)
    Abstract [en]

    We present a technique for crowd-sourcing street-level geographic information using spoken natural language. In particular, we are interested in obtaining first-person-view information about what can be seen from different positions in the city. This information can then for example be used for pedestrian routing services. The approach has been tested in the lab using a fully implemented spoken dialogue system, and is showing promising results.

  • 9.
    Meena, Raveesh
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Boye, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Using a Spoken Dialogue System for Crowdsourcing Street-level Geographic Information2014Conference paper (Refereed)
    Abstract [en]

    We present a novel scheme for enriching geographic database with street-level geographic information that could be useful for pedestrian navigation. A spoken dialogue system for crowdsourcing street-level geographic details was developed and tested in an in-lab experimentation, and has shown promising results.

  • 10.
    Meena, Raveesh
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Dabbaghchian, Saeed
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A Data-driven Approach to Detection of Interruptions in Human-–human Conversations2014Conference paper (Refereed)
  • 11.
    Meena, Raveesh
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    David Lopes, José
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Automatic Detection of Miscommunication in Spoken Dialogue Systems2015In: Proceedings of 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2015, p. 354-363Conference paper (Refereed)
    Abstract [en]

    In this paper, we present a data-driven approach for detecting instances of miscommunication in dialogue system interactions. A range of generic features that are both automatically extractable and manually annotated were used to train two models for online detection and one for offline analysis. Online detection could be used to raise the error awareness of the system, whereas offline detection could be used by a system designer to identify potential flaws in the dialogue design. In experimental evaluations on system logs from three different dialogue systems that vary in their dialogue strategy, the proposed models performed substantially better than the majority class baseline models.

  • 12.
    Meena, Raveesh
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Jokinen, Kristiina
    Wilcock, Graham
    Integration of gestures and speech in human-robot interaction2012In: 3rd IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2012 - Proceedings, IEEE , 2012, p. 673-678Conference paper (Refereed)
    Abstract [en]

    We present an approach to enhance the interaction abilities of the Nao humanoid robot by extending its communicative behavior with non-verbal gestures (hand and head movements, and gaze following). A set of non-verbal gestures were identified that Nao could use for enhancing its presentation and turn-management capabilities in conversational interactions. We discuss our approach for modeling and synthesizing gestures on the Nao robot. A scheme for system evaluation that compares the values of users' expectations and actual experiences has been presented. We found that open arm gestures, head movements and gaze following could significantly enhance Nao's ability to be expressive and appear lively, and to engage human users in conversational interactions.

  • 13.
    Meena, Raveesh
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A Chunking Parser for Semantic Interpretation of Spoken Route Directions in Human-Robot Dialogue2012In: Proceedings of the 4th Swedish Language Technology Conference (SLTC 2012), Lund, Sweden, 2012, p. 55-56Conference paper (Refereed)
    Abstract [en]

    We present a novel application of the chunking parser for data-driven semantic interpretation of spoken route directions into route graphs that are useful for robot navigation. Various sets of features and machine learning algorithms were explored. The results indicate that our approach is robust to speech recognition errors, and could be easily used in other languages using simple features.

  • 14.
    Meena, Raveesh
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A data-driven approach to understanding spoken route directions in human-robot dialogue2012In: 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, 2012, p. 226-229Conference paper (Refereed)
    Abstract [en]

    In this paper, we present a data-driven chunking parser for automatic interpretation of spoken route directions into a route graph that is useful for robot navigation. Different sets of features and machine learning algorithms are explored. The results indicate that our approach is robust to speech recognition errors.

  • 15.
    Meena, Raveesh
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A Data-driven Model for Timing Feedback in a Map Task Dialogue System2013In: 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGdial, Metz, France, 2013, p. 375-383Conference paper (Refereed)
    Abstract [en]

    We present a data-driven model for detecting suitable response locations in the user’s speech. The model has been trained on human–machine dialogue data and implemented and tested in a spoken dialogue system that can perform the Map Task with users. To our knowledge, this is the first example of a dialogue system that uses automatically extracted syntactic, prosodic and contextual features for online detection of response locations. A subjective evaluation of the dialogue system suggests that interactions with a system using our trained model were perceived significantly better than those with a system using a model that made decisions at random.

  • 16.
    Meena, Raveesh
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Human Evaluation of Conceptual Route Graphs for Interpreting Spoken Route Descriptions2013In: Proceedings of the 3rd International Workshop on Computational Models of Spatial Language Interpretation and Generation (CoSLI), Potsdam, Germany, 2013, p. 30-35Conference paper (Refereed)
    Abstract [en]

    We present a human evaluation of the usefulness of conceptual route graphs (CRGs) when it comes to route following using spoken route descriptions. We describe a method for data-driven semantic interpretation of route de-scriptions into CRGs. The comparable performances of human participants in sketching a route using the manually transcribed CRGs and the CRGs produced on speech recognized route descriptions indicate the robustness of our method in preserving the vital conceptual information required for route following despite speech recognition errors.

  • 17.
    Meena, Raveesh
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    The Map Task Dialogue System: A Test-bed for Modelling Human-Like Dialogue2013In: 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGdial, Metz, France, 2013, p. 366-368Conference paper (Refereed)
    Abstract [en]

    The demonstrator presents a test-bed for collecting data on human–computer dialogue: a fully automated dialogue system that can perform Map Task with a user. In a first step, we have used the test-bed to collect human–computer Map Task dialogue data, and have trained various data-driven models on it for detecting feedback response locations in the user’s speech. One of the trained models has been tested in user interactions and was perceived better in comparison to a system using a random model. The demonstrator will exhibit three versions of the Map Task dialogue system—each using a different trained data-driven model of Response Location Detection.

  • 18.
    Meena, Raveesh
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafsson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Data-driven models for timing feedback responses in a Map Task dialogue system2014In: Computer speech & language (Print), ISSN 0885-2308, E-ISSN 1095-8363, Vol. 28, no 4, p. 903-922Article in journal (Refereed)
    Abstract [en]

    Traditional dialogue systems use a fixed silence threshold to detect the end of users' turns. Such a simplistic model can result in system behaviour that is both interruptive and unresponsive, which in turn affects user experience. Various studies have observed that human interlocutors take cues from speaker behaviour, such as prosody, syntax, and gestures, to coordinate smooth exchange of speaking turns. However, little effort has been made towards implementing these models in dialogue systems and verifying how well they model the turn-taking behaviour in human computer interactions. We present a data-driven approach to building models for online detection of suitable feedback response locations in the user's speech. We first collected human computer interaction data using a spoken dialogue system that can perform the Map Task with users (albeit using a trick). On this data, we trained various models that use automatically extractable prosodic, contextual and lexico-syntactic features for detecting response locations. Next, we implemented a trained model in the same dialogue system and evaluated it in interactions with users. The subjective and objective measures from the user evaluation confirm that a model trained on speaker behavioural cues offers both smoother turn-transitions and more responsive system behaviour.

1 - 18 of 18
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf