Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Developing Multimodal Spoken Dialogue Systems: Empirical Studies of Spoken Human–Computer Interaction
KTH, Superseded Departments, Speech, Music and Hearing.ORCID iD: 0000-0002-0397-6442
2002 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

This thesis presents work done during the last ten years on developing five multimodal spoken dialogue systems, and the empirical user studies that have been conducted with them. The dialogue systems have been multimodal, giving information both verbally with animated talking characters and graphically on maps and in text tables. To be able to study a wider rage of user behaviour each new system has been in a new domain and with a new set of interactional abilities. The five system presented in this thesis are: The Waxholm system where users could ask about the boat traffic in the Stockholm archipelago; the Gulan system where people could retrieve information from the Yellow pages of Stockholm; the August system which was a publicly available system where people could get information about the author Strindberg, KTH and Stockholm; the AdAptsystem that allowed users to browse apartments for sale in Stockholm and the Pixie system where users could help ananimated agent to fix things in a visionary apartment publicly available at the Telecom museum in Stockholm. Some of the dialogue systems have been used in controlled experiments in laboratory environments, while others have been placed inpublic environments where members of the general public have interacted with them. All spoken human-computer interactions have been transcribed and analyzed to increase our understanding of how people interact verbally with computers, and to obtain knowledge on how spoken dialogue systems canutilize the regularities found in these interactions. This thesis summarizes the experiences from building these five dialogue systems and presents some of the findings from the analyses of the collected dialogue corpora.

Place, publisher, year, edition, pages
Stockholm: KTH , 2002. , x, 96 p.
Series
Trita-TMH, 2002:8
Keyword [en]
Spoken dialogue system, multimodal, speech, GUI, animated agents, embodied conversational characters, talking heads, empirical user studies, speech corpora, system evaluation, system development, Wizard of Oz simulations, system architecture, linguis
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:kth:diva-3460OAI: oai:DiVA.org:kth-3460DiVA: diva2:9262
Public defence
2002-12-20, 00:00
Note
QC 20100611Available from: 2002-12-11 Created: 2002-12-11 Last updated: 2010-06-11Bibliographically approved
List of papers
1. Spoken dialogue data collected in the Waxholm project
Open this publication in new window or tab >>Spoken dialogue data collected in the Waxholm project
Show others...
1995 (English)In: Quarterly progress and status report: April 15, 1995 /Speech Transmission Laboratory, Stockholm: KTH , 1995, 1, 50-73 p.Chapter in book (Other academic)
Place, publisher, year, edition, pages
Stockholm: KTH, 1995 Edition: 1
Series
Trita-TMH, ISSN 1104-5787 ; 1995:2
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13316 (URN)
Note
QC 20100611Available from: 2010-06-11 Created: 2010-06-11 Last updated: 2010-06-11Bibliographically approved
2. How do System Questions Influence Lexical Choices in User Answers?
Open this publication in new window or tab >>How do System Questions Influence Lexical Choices in User Answers?
1997 (English)In: Proceedings of Eurospeech 97, 1997, 2275-2278 p.Conference paper, Published paper (Other academic)
Abstract [en]

This paper describes some studies on the effect of the system vocabulary on the lexical choices of the users. There are many theories about human-human dialogues that could be useful in the design of spoken dialoguesystems. This paper will give an overview of some of these theories and report the results from two experiments that examines one of these theories, namely lexical entrainment. The first experiment was a small Wizard of Oz-test that simulated a tourist informationsystem with a speech interface, and the second experiment simulated a system with speech recognition that controlled a questionnaire about peoples plans for their vacation. Both experiments show that the subjects mostly adapt their lexical choices to the system questions. Only in less than 5% of the cases did they use an alternative main verb in the answer. These results encourage us to investigate the possibility to add anadaptive language model in the speech recognizer in our dialogue system, where the probabilities for the words used in the system questions are increased.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13323 (URN)
Note
QC 20100611Available from: 2010-06-11 Created: 2010-06-11 Last updated: 2010-06-11Bibliographically approved
3. Repetition and its phonetic realizations: investigating a Swedish databaseof spontaneous computer directed speech
Open this publication in new window or tab >>Repetition and its phonetic realizations: investigating a Swedish databaseof spontaneous computer directed speech
1999 (English)In: Proceedings of the XIVth International Congress of Phonetic Sciences / [ed] Ohala, John, 1999, 1221- p.Conference paper, Published paper (Other academic)
Abstract [en]

This paper is an investigation of repetitive utterances in a Swedish database of spontaneous computer-directed speech. A spoken dialogue system was installed in a public location in downtown Stockholm and spontaneous human-computerinteractions with adults and children were recorded [1]. Several acoustic and prosodic features such as duration, shifting of focusand hyperarticulation were examined to see whether repetitions could be distinguished from what the users first said to the system. The present study indicates that adults and children use partly different strategies as they attempt to resolve errors by means of repetition. As repetition occurs, duration is increased and words are often hyperarticulated or contrastively focused. These results could have implications for the development of future spoken dialogue systems with robust error handling.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13325 (URN)
Conference
XIVth International Congress of Phonetic Sciences, San Francisco, USA, 1999.
Note
QC 20100611Available from: 2010-06-11 Created: 2010-06-11 Last updated: 2010-06-11Bibliographically approved
4. Speech technology on trial: Experiences from the August system
Open this publication in new window or tab >>Speech technology on trial: Experiences from the August system
2000 (English)In: Natural Language Engineering, ISSN 1351-3249, E-ISSN 1469-8110, Vol. 6, no 3-4, 273-286 p.Article in journal (Refereed) Published
Abstract [en]

In this paper, the August spoken dialogue system is described. This experimental Swedish dialogue system, which featured an animated talking agent, was exposed to the general public during a trial period of six months. The construction of the system was partly motivated by the need to collect genuine speech data from people with little or no previous experience of spoken dialogue systems. A corpus of more than 10,000 utterances of spontaneous computer- directed speech was collected and empirical linguistic analyses were carried out. Acoustical, lexical and syntactical aspects of this data were examined. In particular, user behavior and user adaptation during error resolution were emphasized. Repetitive sequences in the database were analyzed in detail. Results suggest that computer-directed speech during error resolution is increased in duration, hyperarticulated and contains inserted pauses. Design decisions which may have influenced how the users behaved when they interacted with August are discussed and implications for the development of future systems are outlined.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13326 (URN)
Note
QC 20100611Available from: 2010-06-11 Created: 2010-06-11 Last updated: 2017-12-12Bibliographically approved
5. Modality Convergence in a Multimodal Dialogue System
Open this publication in new window or tab >>Modality Convergence in a Multimodal Dialogue System
2000 (English)In: Proceedings of Götalog, 2000, 29-34 p.Conference paper, Published paper (Other academic)
Abstract [en]

When designing multimodal dialogue systems allowing speech as well as graphical operations, it is important to understand not only how people make use of the different modalities in their utterances, but also how the system might influence a user’s choice of modality by its own behavior. This paper describes an experiment in which subjects interacted with two versions of a simulated multimodal dialogue system. One version used predominantly graphical means when referring to specific objects; the other used predominantly verbal referential expressions. The purpose of the study was to find out what effect, if any, the system’s referential strategy had on the user’s behavior. The results provided limited support for the hypothesis that the system can influence users to adopt another modality for the purpose of referring

National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13328 (URN)
Conference
Fourth Workshop on the Semantics and Pragmatics of Dialogue
Note
QC 20100611Available from: 2010-06-11 Created: 2010-06-11 Last updated: 2010-06-11Bibliographically approved
6. Positive and Negative User Feedback in a Spoken Dialogue Corpus
Open this publication in new window or tab >>Positive and Negative User Feedback in a Spoken Dialogue Corpus
2000 (English)In: Proceedings of ICSLP 00, 2000Conference paper, Published paper (Other academic)
Abstract [en]

This paper examines feedback strategies in a Swedish corpus of multimodal human--computer interaction. The aim of the study is to investigate how users provide positive and negative feedback to a dialogue system and to discuss the function of these utterances in the dialogues. User feedback in the AdApt corpus was labeled and analyzed, and its distribution in the dialogues is discussed. The question of whether it is possible to utilize user feedback in future systems is considered. More specifically, we discuss how error handling in human--computer dialogue might be improved through greater knowledge of user feedback strategies. In the present corpus, almost all subjects used positive or negative feedback at least once during their interaction with the system. Our results indicate that some types of feedback more often occur in certain positions in the dialogue. Another observation is that there appear to be great individual variations in feedback strategies, so that certain subjects give feedback at almost every turn while others rarely or never respond to a spoken dialogue system in this manner. Finally, we discuss how feedback could be used to prevent problems in human--computer dialogue.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13330 (URN)
Conference
nternational Conference on Spoken Language Processing
Note
QC 20100611Available from: 2010-06-11 Created: 2010-06-11 Last updated: 2010-06-11Bibliographically approved
7. A Comparison of Disfluency Distribution in a Unimodal and a Multimodal Speech Interface
Open this publication in new window or tab >>A Comparison of Disfluency Distribution in a Unimodal and a Multimodal Speech Interface
2000 (English)In: Proceedings of ICSLP 00, 2000Conference paper, Published paper (Other academic)
Abstract [en]

In this paper, we compare the distribution of disfluencies in two human--computer dialogue corpora. One corpus consists of unimodal travel booking dialogues, which were recorded over the telephone. In this unimodal system, all components except the speech recognition were authentic. The other corpus was collected using a semi-simulated multi-modal dialogue system with an animated talking agent and a clickable map. The aim of this paper is to analyze and discuss the effects of modality, task and interface design on the distribution and frequency of disfluencies in these two corpora.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13331 (URN)
Conference
International Conference on Spoken Language Processing
Note
QC 20100611Available from: 2010-06-11 Created: 2010-06-11 Last updated: 2010-06-11Bibliographically approved
8. Real-time Handling of Fragmented Utterances
Open this publication in new window or tab >>Real-time Handling of Fragmented Utterances
2001 (English)In: Proceedings of the NAACL Workshop on Adaption in Dialogue Systems, 2001Conference paper, Published paper (Refereed)
Abstract [en]

this paper, we discuss an adaptive method of handling fragmented user utterances to a speech-based multimodal dialogue system. Inserted silent pauses between fragments present the following problem: Does the current silence indicate that the user has completed her utterance, or is the silence just a pause between two fragments, so that the system should wait for more input? Our system incrementally classifies user utterances as either closing (more input is unlikely to come) or non-closing (more input is likely to come), partly depending on the current dialogue state. Utterances that are categorized as non-closing allow the dialogue system to await additional spoken or graphical input before responding

National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13332 (URN)
Conference
NAACL Workshop on Adaption in Dialogue Systems
Note

QC 20100611. QC 20160221

Available from: 2010-06-11 Created: 2010-06-11 Last updated: 2016-02-21Bibliographically approved
9. Constraint Manipulation and Visualization in a Multimodal Dialogue System
Open this publication in new window or tab >>Constraint Manipulation and Visualization in a Multimodal Dialogue System
Show others...
2002 (English)In: Proceedings of MultiModal Dialogue in Mobile Environments, 2002Conference paper, Published paper (Other academic)
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13333 (URN)
Conference
MultiModal Dialogue in Mobile Environments
Note
QC 20100611Available from: 2010-06-11 Created: 2010-06-11 Last updated: 2010-06-11Bibliographically approved
10. Voice Transformations For Improving Children's Speech Recognition In A Publicly Available Dialogue System
Open this publication in new window or tab >>Voice Transformations For Improving Children's Speech Recognition In A Publicly Available Dialogue System
2002 (English)In: Proceedings of ICSLP 02, 2002Conference paper, Published paper (Other academic)
Abstract [en]

To be able to build acoustic models for children, that can beused in spoken dialogue systems, speech data has to be collected. Commercial recognizers available for Swedish are trained on adult speech, which makes them less suitable for children’s computer-directed speech. This paper describes some experiments with on-the-fly voice transformation of children’s speech. Two transformation methods were tested, one inspired by the Phase Vocoder algorithm and another by the Time-Domain Pitch-Synchronous Overlap-Add (TD-PSOLA)algorithm. The speech signal is transformed before being sent to the speech recognizer for adult speech. Our results show that this method reduces the error rates in the order of thirty to fortyfive percent for children users.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13339 (URN)
Conference
International Conference on Spoken Language Processing
Note
QC 20100611Available from: 2010-06-11 Created: 2010-06-11 Last updated: 2010-06-11Bibliographically approved

Open Access in DiVA

fulltext(8354 kB)1879 downloads
File information
File name FULLTEXT01.pdfFile size 8354 kBChecksum MD5
4bab48198ba3b4875ab8fc9004bee7eb86c8c305d7ed2b42d6e62b6126522201c8cd5b51
Type fulltextMimetype application/pdf

Authority records BETA

Gustafson, Joakim

Search in DiVA

By author/editor
Gustafson, Joakim
By organisation
Speech, Music and Hearing
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 1879 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1367 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf