Endre søk
Link to record
Permanent link

Direct link
Publikasjoner (10 av 18) Visa alla publikasjoner
Lison, P. & Meena, R. (2016). Automatic Turn Segmentation for Movie & TV Subtitles. In: 2016 IEEE Workshop on Spoken Language Technology (SLT 2016): . Paper presented at 2016 IEEE Workshop on Spoken Language Technology (pp. 245-252). IEEE conference proceedings
Åpne denne publikasjonen i ny fane eller vindu >>Automatic Turn Segmentation for Movie & TV Subtitles
2016 (engelsk)Inngår i: 2016 IEEE Workshop on Spoken Language Technology (SLT 2016), IEEE conference proceedings, 2016, s. 245-252Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Movie and TV subtitles contain large amounts of conversational material, but lack an explicit turn structure. This paper present a data-driven approach to the segmentation of subtitles into dialogue turns. Training data is first extracted by aligning subtitles with transcripts in order to obtain speaker labels. This data is then used to build a classifier whose task is to determine whether two consecutive sentences are part of the same dialogue turn. The approach relies on linguistic, visual and timing features extracted from the subtitles themselves and does not require access to the audiovisual material -- although speaker diarization can be exploited when audio data is available. The approach also exploits alignments with related subtitles in other languages to further improve the classification performance. The classifier achieves an accuracy of 78% on a held-out test set. A follow-up annotation experiment demonstrates that this task is also difficult for human annotators.

sted, utgiver, år, opplag, sider
IEEE conference proceedings, 2016
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-193938 (URN)10.1109/SLT.2016.7846272 (DOI)000399128000036 ()2-s2.0-85016021595 (Scopus ID)
Konferanse
2016 IEEE Workshop on Spoken Language Technology
Merknad

QC 20161014

Tilgjengelig fra: 2016-10-12 Laget: 2016-10-12 Sist oppdatert: 2025-02-01bibliografisk kontrollert
Meena, R. (2016). Data-driven Methods for Spoken Dialogue Systems: Applications in Language Understanding, Turn-taking, Error Detection, and Knowledge Acquisition. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Åpne denne publikasjonen i ny fane eller vindu >>Data-driven Methods for Spoken Dialogue Systems: Applications in Language Understanding, Turn-taking, Error Detection, and Knowledge Acquisition
2016 (engelsk)Doktoravhandling, monografi (Annet vitenskapelig)
Abstract [en]

Spoken dialogue systems are application interfaces that enable humans to interact with computers using spoken natural language. A major challenge for these systems is dealing with the ubiquity of variability—in user behavior, in the performance of the various speech and language processing sub-components, and in the dynamics of the task domain. However, as the predominant methodology for dialogue system development is to handcraft the sub-components, these systems typically lack robustness in user interactions. Data-driven methods, on the other hand, have been shown to offer robustness to variability in various domains of computer science and are increasingly being used in dialogue systems research.    

This thesis makes four novel contributions to the data-driven methods for spoken dialogue system development. First, a method for interpreting the meaning contained in spoken utterances is presented. Second, an approach for determining when in a user’s speech it is appropriate for the system to give a response is presented. Third, an approach for error detection and analysis in dialogue system interactions is reported. Finally, an implicitly supervised learning approach for knowledge acquisition through the interactive setting of spoken dialogue is presented.     

The general approach taken in this thesis is to model dialogue system tasks as a classification problem and investigate features (e.g., lexical, syntactic, semantic, prosodic, and contextual) to train various classifiers on interaction data. The central hypothesis of this thesis is that the models for the aforementioned dialogue system tasks trained using the features proposed here perform better than their corresponding baseline models. The empirical validity of this claim has been assessed through both quantitative and qualitative evaluations, using both objective and subjective measures.

Abstract [sv]

Den här avhandlingen utforskar datadrivna metoder för utveckling av talande dialogsystem. Motivet bakom sådana metoder är att dialogsystem måste kunna hantera en stor variation, i såväl användarnas beteende, som i prestandan hos olika tal- och språkteknologiska delkomponenter. Traditionella tillvägagångssätt, som baseras på handskrivna komponenter i dialogsystem, har ofta svårt att uppvisa robusthet i hanteringen av sådan variation. Datadrivna metoder har visat sig vara robusta mot variation i olika problem inom datavetenskap och artificiell intelligens, och har på senare tid blivit populära även inom forskning kring talande dialogsystem.

Den här avhandlingen presenterar fyra nya bidrag till datadrivna metoder för utveckling av talande dialogsystem. Det första bidraget är en datadriven metod för semantisk tolkning av talspråk. Den föreslagna metoden har två viktiga egenskaper: robust hantering av ”ogrammatisk” indata (på grund av talets spontana natur samt fel i taligenkänning), samt bevarande av strukturella relationer mellan koncept i den semantiska representationen. Tidigare metoder för semantisk tolkning av talspråk har typiskt sett endast hanterat den ena av dessa två utmaningar.

Det andra bidraget i avhandlingen är en datadriven metod för turtagning i dialogsystem. Den föreslagna modellen utnyttjar prosodi, syntax, semantik samt dialogkontext för att avgöra när i användarens tal som det är lämpligt för systemet att ge respons.

Det tredje bidraget är en data-driven metod för detektering av fel och missförstånd i dialogsystem. Där tidigare arbeten har fokuserat på detektering av fel on-line och endast testats i enskilda domäner, presenterats här modeller för analys av fel såväl off-line som on-line, och som tränats samt utvärderats på tre skilda dialogsystemkorpusar.

Slutligen presenteras en metod för hur dialogsystem ska kunna tillägna sig ny kunskap genom interaktion med användaren. Metoden är utvärderad i ett scenario där systemet ska bygga upp en kunskapsbas i en geografisk domän genom så kallad "crowdsourcing". Systemet börjar med minimal kunskap och använder den talade dialogen för att både samla ny information och verifiera den kunskap som inhämtats.

Den generella ansatsen i den här avhandlingen är att modellera olika uppgifter för dialogsystem som  klassificeringsproblem, och undersöka särdrag i diskursens kontext som kan användas för att träna klassificerare. Under arbetets gång har olika slags lexikala, syntaktiska, prosodiska samt kontextuella särdrag undersökts. En allmän diskussion om dessa särdrags bidrag till modellering av ovannämnda uppgifter utgör ett av avhandlingens huvudsakliga bidrag. En annan central del i avhandlingen är att träna modeller som kan användas direkt i dialogsystem, varför endast automatiskt extraherbara särdrag (som inte kräver manuell uppmärkning) används för att träna modellerna. Vidare utvärderas modellernas prestanda på såväl taligenkänningsresultat som transkriptioner för att undersöka hur robusta de föreslagna metoderna är.

Den centrala hypotesen i denna avhandling är att modeller som tränas med de föreslagna kontextuella särdragen presterar bättre än en referensmodell. Giltigheten hos denna hypotes har bedömts med såväl kvalitativa som kvantitativa utvärderingar, som nyttjar både objektiva och subjektiva mått.

sted, utgiver, år, opplag, sider
Stockholm: KTH Royal Institute of Technology, 2016. s. xix, 223
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 2016:03
Emneord
Language Understanding, Turn-taking, Error Detection, Knowledge Acquisition, Crowdsourcing, semantisk tolkning talspråk, turtagning i dialogsystem, fel och missförstånd, crowdsourcing, dialogsystem
HSV kategori
Forskningsprogram
Tal- och musikkommunikation
Identifikatorer
urn:nbn:se:kth:diva-182985 (URN)978-91-7595-866-8 (ISBN)
Disputas
2016-03-18, F3, Lindstedtsvägen 26, KTH, Stockholm, 10:00 (engelsk)
Opponent
Veileder
Forskningsfinansiär
EU, FP7, Seventh Framework ProgrammeSwedish Research Council
Merknad

QC 20160225

Tilgjengelig fra: 2016-02-25 Laget: 2016-02-24 Sist oppdatert: 2022-06-23bibliografisk kontrollert
Georgiladakis, S., Athanasopoulou, G., Meena, R., David Lopes, J., Chorianopoulou, A., Palogiannidi, E., . . . Potamianos, A. (2016). Root Cause Analysis of Miscommunication Hotspots in Spoken Dialogue Systems. In: Interspeech 2016: . Paper presented at Interspeech 2016. San Francisco, CA, USA: International Speech Communication Association
Åpne denne publikasjonen i ny fane eller vindu >>Root Cause Analysis of Miscommunication Hotspots in Spoken Dialogue Systems
Vise andre…
2016 (engelsk)Inngår i: Interspeech 2016, San Francisco, CA, USA: International Speech Communication Association, 2016Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

A major challenge in Spoken Dialogue Systems (SDS) is the detection of problematic communication (hotspots), as well as the classification of these hotspots into different types (root cause analysis). In this work, we focus on two classes of root cause, namely, erroneous speech recognition vs. other (e.g., dialogue strategy). Specifically, we propose an automatic algorithm for detecting hotspots and classifying root causes in two subsequent steps. Regarding hotspot detection, various lexico-semantic features are used for capturing repetition patterns along with affective features. Lexico-semantic and repetition features are also employed for root cause analysis. Both algorithms are evaluated with respect to the Let’s Go dataset (bus information system). In terms of classification unweighted average recall, performance of 80% and 70% is achieved for hotspot detection and root cause analysis, respectively.

sted, utgiver, år, opplag, sider
San Francisco, CA, USA: International Speech Communication Association, 2016
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-193937 (URN)10.21437/Interspeech.2016-1273 (DOI)000409394400240 ()2-s2.0-84994350325 (Scopus ID)
Konferanse
Interspeech 2016
Merknad

QC 20161017. tmh_import_16_10_12, tmh_id_4054

Tilgjengelig fra: 2016-10-12 Laget: 2016-10-12 Sist oppdatert: 2025-02-01bibliografisk kontrollert
Meena, R., David Lopes, J., Skantze, G. & Gustafson, J. (2015). Automatic Detection of Miscommunication in Spoken Dialogue Systems. In: Proceedings of 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL): . Paper presented at 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL) (pp. 354-363).
Åpne denne publikasjonen i ny fane eller vindu >>Automatic Detection of Miscommunication in Spoken Dialogue Systems
2015 (engelsk)Inngår i: Proceedings of 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2015, s. 354-363Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In this paper, we present a data-driven approach for detecting instances of miscommunication in dialogue system interactions. A range of generic features that are both automatically extractable and manually annotated were used to train two models for online detection and one for offline analysis. Online detection could be used to raise the error awareness of the system, whereas offline detection could be used by a system designer to identify potential flaws in the dialogue design. In experimental evaluations on system logs from three different dialogue systems that vary in their dialogue strategy, the proposed models performed substantially better than the majority class baseline models.

HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-180406 (URN)10.18653/v1/w15-4647 (DOI)2-s2.0-84988311476 (Scopus ID)
Konferanse
16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)
Merknad

QC 20160120

Tilgjengelig fra: 2016-01-13 Laget: 2016-01-13 Sist oppdatert: 2025-02-01bibliografisk kontrollert
Lopes, J., Salvi, G., Skantze, G., Abad, A., Gustafson, J., Batista, F., . . . Trancoso, I. (2015). Detecting Repetitions in Spoken Dialogue Systems Using Phonetic Distances. In: INTERSPEECH-2015: . Paper presented at INTERSPEECH-2015, Dresden, Germany (pp. 1805-1809).
Åpne denne publikasjonen i ny fane eller vindu >>Detecting Repetitions in Spoken Dialogue Systems Using Phonetic Distances
Vise andre…
2015 (engelsk)Inngår i: INTERSPEECH-2015, 2015, s. 1805-1809Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Repetitions in Spoken Dialogue Systems can be a symptom of problematic communication. Such repetitions are often due to speech recognition errors, which in turn makes it harder to use the output of the speech recognizer to detect repetitions. In this paper, we combine the alignment score obtained using phonetic distances with dialogue-related features to improve repetition detection. To evaluate the method proposed we compare several alignment techniques from edit distance to DTW-based distance, previously used in Spoken-Term detection tasks. We also compare two different methods to compute the phonetic distance: the first one using the phoneme sequence, and the second one using the distance between the phone posterior vectors. Two different datasets were used in this evaluation: a bus-schedule information system (in English) and a call routing system (in Swedish). The results show that approaches using phoneme distances over-perform approaches using Levenshtein distances between ASR outputs for repetition detection.

HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-180405 (URN)000380581600375 ()2-s2.0-84959138120 (Scopus ID)978-1-5108-1790-6 (ISBN)
Konferanse
INTERSPEECH-2015, Dresden, Germany
Merknad

QC 20160216

Tilgjengelig fra: 2016-01-13 Laget: 2016-01-13 Sist oppdatert: 2025-02-01bibliografisk kontrollert
Meena, R., Dabbaghchian, S. & Stefanov, K. (2014). A Data-driven Approach to Detection of Interruptions in Human-–human Conversations. In: : . Paper presented at FONETIK, Stockholm, Sweden.
Åpne denne publikasjonen i ny fane eller vindu >>A Data-driven Approach to Detection of Interruptions in Human-–human Conversations
2014 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-158181 (URN)
Konferanse
FONETIK, Stockholm, Sweden
Merknad

QC 20161017

Tilgjengelig fra: 2014-12-30 Laget: 2014-12-30 Sist oppdatert: 2022-06-23bibliografisk kontrollert
Meena, R., Boye, J., Skantze, G. & Gustafson, J. (2014). Crowdsourcing Street-level Geographic Information Using a Spoken Dialogue System. In: Proceedings of the SIGDIAL 2014 Conference: . Paper presented at 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGdial, Philadelphia, U.S.A., 18-20 June 2014 (pp. 2-11). Association for Computational Linguistics
Åpne denne publikasjonen i ny fane eller vindu >>Crowdsourcing Street-level Geographic Information Using a Spoken Dialogue System
2014 (engelsk)Inngår i: Proceedings of the SIGDIAL 2014 Conference, Association for Computational Linguistics, 2014, s. 2-11Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

We present a technique for crowd-sourcing street-level geographic information using spoken natural language. In particular, we are interested in obtaining first-person-view information about what can be seen from different positions in the city. This information can then for example be used for pedestrian routing services. The approach has been tested in the lab using a fully implemented spoken dialogue system, and is showing promising results.

sted, utgiver, år, opplag, sider
Association for Computational Linguistics, 2014
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-158157 (URN)10.3115/v1/w14-4302 (DOI)2-s2.0-84987942219 (Scopus ID)978-194164321-1 (ISBN)
Konferanse
15th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGdial, Philadelphia, U.S.A., 18-20 June 2014
Merknad

QC 20150410

Tilgjengelig fra: 2014-12-30 Laget: 2014-12-30 Sist oppdatert: 2025-02-01bibliografisk kontrollert
Meena, R., Skantze, G. & Gustafsson, J. (2014). Data-driven models for timing feedback responses in a Map Task dialogue system. Computer speech & language (Print), 28(4), 903-922
Åpne denne publikasjonen i ny fane eller vindu >>Data-driven models for timing feedback responses in a Map Task dialogue system
2014 (engelsk)Inngår i: Computer speech & language (Print), ISSN 0885-2308, E-ISSN 1095-8363, Vol. 28, nr 4, s. 903-922Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Traditional dialogue systems use a fixed silence threshold to detect the end of users' turns. Such a simplistic model can result in system behaviour that is both interruptive and unresponsive, which in turn affects user experience. Various studies have observed that human interlocutors take cues from speaker behaviour, such as prosody, syntax, and gestures, to coordinate smooth exchange of speaking turns. However, little effort has been made towards implementing these models in dialogue systems and verifying how well they model the turn-taking behaviour in human computer interactions. We present a data-driven approach to building models for online detection of suitable feedback response locations in the user's speech. We first collected human computer interaction data using a spoken dialogue system that can perform the Map Task with users (albeit using a trick). On this data, we trained various models that use automatically extractable prosodic, contextual and lexico-syntactic features for detecting response locations. Next, we implemented a trained model in the same dialogue system and evaluated it in interactions with users. The subjective and objective measures from the user evaluation confirm that a model trained on speaker behavioural cues offers both smoother turn-transitions and more responsive system behaviour.

Emneord
Spoken dialogue systems, Timing feedback, Turn-taking, User evaluation
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-147402 (URN)10.1016/j.csl.2014.02.002 (DOI)000336694200005 ()2-s2.0-84900533798 (Scopus ID)
Merknad

QC 20140702

Tilgjengelig fra: 2014-07-02 Laget: 2014-06-27 Sist oppdatert: 2025-02-01bibliografisk kontrollert
Meena, R., Boye, J., Skantze, G. & Gustafson, J. (2014). Using a Spoken Dialogue System for Crowdsourcing Street-level Geographic Information. In: : . Paper presented at 2nd Workshop on Action, Perception and Language, SLTC 2014.
Åpne denne publikasjonen i ny fane eller vindu >>Using a Spoken Dialogue System for Crowdsourcing Street-level Geographic Information
2014 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

We present a novel scheme for enriching geographic database with street-level geographic information that could be useful for pedestrian navigation. A spoken dialogue system for crowdsourcing street-level geographic details was developed and tested in an in-lab experimentation, and has shown promising results.

HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-158178 (URN)
Konferanse
2nd Workshop on Action, Perception and Language, SLTC 2014
Merknad

QC 20150410

Tilgjengelig fra: 2014-12-30 Laget: 2014-12-30 Sist oppdatert: 2025-02-01bibliografisk kontrollert
Meena, R., Skantze, G. & Gustafson, J. (2013). A Data-driven Model for Timing Feedback in a Map Task Dialogue System. In: 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGdial: . Paper presented at 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGdial (pp. 375-383). Metz, France
Åpne denne publikasjonen i ny fane eller vindu >>A Data-driven Model for Timing Feedback in a Map Task Dialogue System
2013 (engelsk)Inngår i: 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGdial, Metz, France, 2013, s. 375-383Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

We present a data-driven model for detecting suitable response locations in the user’s speech. The model has been trained on human–machine dialogue data and implemented and tested in a spoken dialogue system that can perform the Map Task with users. To our knowledge, this is the first example of a dialogue system that uses automatically extracted syntactic, prosodic and contextual features for online detection of response locations. A subjective evaluation of the dialogue system suggests that interactions with a system using our trained model were perceived significantly better than those with a system using a model that made decisions at random.

sted, utgiver, år, opplag, sider
Metz, France: , 2013
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-134935 (URN)2-s2.0-84987925314 (Scopus ID)
Konferanse
14th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGdial
Merknad

tmh_import_13_12_02, tmh_id_3863. QC 20140320

Tilgjengelig fra: 2013-12-02 Laget: 2013-12-02 Sist oppdatert: 2025-02-01bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-7412-0967