kth.sePublikationer KTH
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Evaluating Sampling-based Filler Insertion with Spontaneous TTS
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-0397-6442
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0003-1175-840X
2022 (Engelska)Ingår i: LREC 2022: Thirteen International Conference On Language Resources And Evaluation / [ed] Calzolari, N Bechet, F Blache, P Choukri, K Cieri, C Declerck, T Goggi, S Isahara, H Maegaard, B Mazo, H Odijk, H Piperidis, S, European Language Resources Association (ELRA) , 2022, s. 1960-1969Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Inserting fillers (such as "um", "like") to clean speech text has a rich history of study. One major application is to make dialogue systems sound more spontaneous. The ambiguity of filler occurrence and inter-speaker difference make both modeling and evaluation difficult. In this paper, we study sampling-based filler insertion, a simple yet unexplored approach to inserting fillers. We propose an objective score called Filler Perplexity (FPP). We build three models trained on two single-speaker spontaneous corpora, and evaluate them with FPP and perceptual tests. We implement two innovations in perceptual tests, (1) evaluating filler insertion on dialogue systems output, (2) synthesizing speech with neural spontaneous TTS engines. FPP proves to be useful in analysis but does not correlate well with perceptual MOS. Perceptual results show little difference between compared filler insertion models including with ground-truth, which may be due to the ambiguity of what is good filler insertion and a strong neural spontaneous TTS that produces natural speech irrespective of input. Results also show preference for filler-inserted speech synthesized with spontaneous TTS. The same test using TTS based on read speech obtains the opposite results, which shows the importance of using spontaneous TTS in evaluating filler insertions. Audio samples: www.speech.kth.se/tts- demos/LREC22

Ort, förlag, år, upplaga, sidor
European Language Resources Association (ELRA) , 2022. s. 1960-1969
Nyckelord [en]
filler insertion, spontaneous text-to-speech, spoken dialogue system
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
URN: urn:nbn:se:kth:diva-324340ISI: 000889371702007Scopus ID: 2-s2.0-85144345531OAI: oai:DiVA.org:kth-324340DiVA, id: diva2:1740010
Konferens
13th International Conference on Language Resources and Evaluation (LREC), JUN 20-25, 2022, Marseille, FRANCE
Anmärkning

QC 20230228

Tillgänglig från: 2023-02-28 Skapad: 2023-02-28 Senast uppdaterad: 2025-02-07Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Scopus

Person

Wang, SiyangGustafsson, JoakimSzékely, Éva

Sök vidare i DiVA

Av författaren/redaktören
Wang, SiyangGustafsson, JoakimSzékely, Éva
Av organisationen
Tal, musik och hörsel, TMH
Språkbehandling och datorlingvistik

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 146 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf