kth.sePublikationer KTH
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Voice Transformations For Improving Children's Speech Recognition In A Publicly Available Dialogue System
KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel. Telia Research AB, Sweden.ORCID-id: 0000-0002-0397-6442
KTH, Tidigare Institutioner (före 2005), Tal, musik och hörsel.
2002 (Engelska)Ingår i: Proceedings of ICSLP 02, International Speech Communication Association , 2002, s. 297-300Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

To be able to build acoustic models for children, that can beused in spoken dialogue systems, speech data has to be collected. Commercial recognizers available for Swedish are trained on adult speech, which makes them less suitable for children’s computer-directed speech. This paper describes some experiments with on-the-fly voice transformation of children’s speech. Two transformation methods were tested, one inspired by the Phase Vocoder algorithm and another by the Time-Domain Pitch-Synchronous Overlap-Add (TD-PSOLA)algorithm. The speech signal is transformed before being sent to the speech recognizer for adult speech. Our results show that this method reduces the error rates in the order of thirty to fortyfive percent for children users.

Ort, förlag, år, upplaga, sidor
International Speech Communication Association , 2002. s. 297-300
Nationell ämneskategori
Teknik och teknologier
Identifikatorer
URN: urn:nbn:se:kth:diva-13339Scopus ID: 2-s2.0-56149113752OAI: oai:DiVA.org:kth-13339DiVA, id: diva2:323753
Konferens
7th International Conference on Spoken Language Processing (ICSLP2002 - INTERSPEECH 2002), Denver, Colorado, USA, September 16-20, 2002
Anmärkning

QC 20100611

Tillgänglig från: 2010-06-11 Skapad: 2010-06-11 Senast uppdaterad: 2022-06-25Bibliografiskt granskad
Ingår i avhandling
1. Developing Multimodal Spoken Dialogue Systems: Empirical Studies of Spoken Human–Computer Interaction
Öppna denna publikation i ny flik eller fönster >>Developing Multimodal Spoken Dialogue Systems: Empirical Studies of Spoken Human–Computer Interaction
2002 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

This thesis presents work done during the last ten years on developing five multimodal spoken dialogue systems, and the empirical user studies that have been conducted with them. The dialogue systems have been multimodal, giving information both verbally with animated talking characters and graphically on maps and in text tables. To be able to study a wider rage of user behaviour each new system has been in a new domain and with a new set of interactional abilities. The five system presented in this thesis are: The Waxholm system where users could ask about the boat traffic in the Stockholm archipelago; the Gulan system where people could retrieve information from the Yellow pages of Stockholm; the August system which was a publicly available system where people could get information about the author Strindberg, KTH and Stockholm; the AdAptsystem that allowed users to browse apartments for sale in Stockholm and the Pixie system where users could help ananimated agent to fix things in a visionary apartment publicly available at the Telecom museum in Stockholm. Some of the dialogue systems have been used in controlled experiments in laboratory environments, while others have been placed inpublic environments where members of the general public have interacted with them. All spoken human-computer interactions have been transcribed and analyzed to increase our understanding of how people interact verbally with computers, and to obtain knowledge on how spoken dialogue systems canutilize the regularities found in these interactions. This thesis summarizes the experiences from building these five dialogue systems and presents some of the findings from the analyses of the collected dialogue corpora.

Ort, förlag, år, upplaga, sidor
Stockholm: KTH, 2002. s. x, 96
Serie
Trita-TMH ; 2002:8
Nyckelord
Spoken dialogue system, multimodal, speech, GUI, animated agents, embodied conversational characters, talking heads, empirical user studies, speech corpora, system evaluation, system development, Wizard of Oz simulations, system architecture, linguis
Nationell ämneskategori
Teknik och teknologier
Identifikatorer
urn:nbn:se:kth:diva-3460 (URN)
Disputation
2002-12-20, 00:00
Anmärkning
QC 20100611Tillgänglig från: 2002-12-11 Skapad: 2002-12-11 Senast uppdaterad: 2022-06-22Bibliografiskt granskad

Open Access i DiVA

fulltext(338 kB)287 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 338 kBChecksumma SHA-512
bc30896b85e671517469242b245686c51dd91cf5ccf6fe5f668bdf925672e856f0072c65d6c2933efaa330da63355543951fffe1e25426d87f7bdbf9902846ad
Typ fulltextMimetyp application/pdf

Övriga länkar

Scopushttps://www.isca-speech.org/archive/archive_papers/icslp_2002/i02_0297.pdf

Person

Gustafson, Joakim

Sök vidare i DiVA

Av författaren/redaktören
Gustafson, JoakimSjölander, Kåre
Av organisationen
Tal, musik och hörsel
Teknik och teknologier

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 289 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 522 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf