Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Voice Transformations For Improving Children's Speech Recognition In A Publicly Available Dialogue System
KTH, Superseded Departments (pre-2005), Speech, Music and Hearing. Telia Research AB, Sweden.ORCID iD: 0000-0002-0397-6442
KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
2002 (English)In: Proceedings of ICSLP 02, International Speech Communication Association , 2002, p. 297-300Conference paper, Published paper (Refereed)
Abstract [en]

To be able to build acoustic models for children, that can beused in spoken dialogue systems, speech data has to be collected. Commercial recognizers available for Swedish are trained on adult speech, which makes them less suitable for children’s computer-directed speech. This paper describes some experiments with on-the-fly voice transformation of children’s speech. Two transformation methods were tested, one inspired by the Phase Vocoder algorithm and another by the Time-Domain Pitch-Synchronous Overlap-Add (TD-PSOLA)algorithm. The speech signal is transformed before being sent to the speech recognizer for adult speech. Our results show that this method reduces the error rates in the order of thirty to fortyfive percent for children users.

Place, publisher, year, edition, pages
International Speech Communication Association , 2002. p. 297-300
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:kth:diva-13339Scopus ID: 2-s2.0-56149113752OAI: oai:DiVA.org:kth-13339DiVA, id: diva2:323753
Conference
7th International Conference on Spoken Language Processing (ICSLP2002 - INTERSPEECH 2002), Denver, Colorado, USA, September 16-20, 2002
Note

QC 20100611

Available from: 2010-06-11 Created: 2010-06-11 Last updated: 2018-06-01Bibliographically approved
In thesis
1. Developing Multimodal Spoken Dialogue Systems: Empirical Studies of Spoken Human–Computer Interaction
Open this publication in new window or tab >>Developing Multimodal Spoken Dialogue Systems: Empirical Studies of Spoken Human–Computer Interaction
2002 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

This thesis presents work done during the last ten years on developing five multimodal spoken dialogue systems, and the empirical user studies that have been conducted with them. The dialogue systems have been multimodal, giving information both verbally with animated talking characters and graphically on maps and in text tables. To be able to study a wider rage of user behaviour each new system has been in a new domain and with a new set of interactional abilities. The five system presented in this thesis are: The Waxholm system where users could ask about the boat traffic in the Stockholm archipelago; the Gulan system where people could retrieve information from the Yellow pages of Stockholm; the August system which was a publicly available system where people could get information about the author Strindberg, KTH and Stockholm; the AdAptsystem that allowed users to browse apartments for sale in Stockholm and the Pixie system where users could help ananimated agent to fix things in a visionary apartment publicly available at the Telecom museum in Stockholm. Some of the dialogue systems have been used in controlled experiments in laboratory environments, while others have been placed inpublic environments where members of the general public have interacted with them. All spoken human-computer interactions have been transcribed and analyzed to increase our understanding of how people interact verbally with computers, and to obtain knowledge on how spoken dialogue systems canutilize the regularities found in these interactions. This thesis summarizes the experiences from building these five dialogue systems and presents some of the findings from the analyses of the collected dialogue corpora.

Place, publisher, year, edition, pages
Stockholm: KTH, 2002. p. x, 96
Series
Trita-TMH ; 2002:8
Keywords
Spoken dialogue system, multimodal, speech, GUI, animated agents, embodied conversational characters, talking heads, empirical user studies, speech corpora, system evaluation, system development, Wizard of Oz simulations, system architecture, linguis
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-3460 (URN)
Public defence
2002-12-20, 00:00
Note
QC 20100611Available from: 2002-12-11 Created: 2002-12-11 Last updated: 2010-06-11Bibliographically approved

Open Access in DiVA

fulltext(338 kB)14 downloads
File information
File name FULLTEXT01.pdfFile size 338 kBChecksum SHA-512
bc30896b85e671517469242b245686c51dd91cf5ccf6fe5f668bdf925672e856f0072c65d6c2933efaa330da63355543951fffe1e25426d87f7bdbf9902846ad
Type fulltextMimetype application/pdf

Other links

Scopushttps://www.isca-speech.org/archive/archive_papers/icslp_2002/i02_0297.pdf

Authority records BETA

Gustafson, Joakim

Search in DiVA

By author/editor
Gustafson, JoakimSjölander, Kåre
By organisation
Speech, Music and Hearing
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 14 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 153 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf