kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Contextual Interactive Evaluation of TTS Models in Dialogue Systems
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1175-840X
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-0397-6442
2024 (English)In: Interspeech 2024, International Speech Communication Association , 2024, p. 2965-2969Conference paper, Published paper (Refereed)
Abstract [en]

Evaluation of text-to-speech (TTS) models is currently dominated by Mean-Opinion-Score (MOS) listening test, but MOS has been increasingly questioned for its validity. MOS tests place listeners in a passive setup, in which they do not actively interact with the TTS and usually evaluate isolated utterances without context. Thus it gives no indication for how well a TTS model suits an interactive application like spoken dialogue system, in which the capability of generating appropriate speech in the dialogue context is paramount. We aim to take a first step towards addressing this shortcoming by evaluating several state-of-the-art neural TTS models, including one that adapts to dialogue context, in a custom-built spoken dialogue system. We present system design, experiment setup, and results. Our work is the first to evaluate TTS in contextual dialogue system interactions. We also discuss the shortcomings and future opportunities of the proposed evaluation paradigm.

Place, publisher, year, edition, pages
International Speech Communication Association , 2024. p. 2965-2969
Keywords [en]
evaluation methodology, human-computer interaction, spoken dialogue system, text-to-speech
National Category
Natural Language Processing Other Engineering and Technologies
Identifiers
URN: urn:nbn:se:kth:diva-358876DOI: 10.21437/Interspeech.2024-1008Scopus ID: 2-s2.0-85214809755OAI: oai:DiVA.org:kth-358876DiVA, id: diva2:1930529
Conference
25th Interspeech Conferece 2024, Kos Island, Greece, Sep 1 2024 - Sep 5 2024
Note

QC 20250128

Available from: 2025-01-23 Created: 2025-01-23 Last updated: 2025-02-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Wang, SiyangSzékely, ÉvaGustafsson, Joakim

Search in DiVA

By author/editor
Wang, SiyangSzékely, ÉvaGustafsson, Joakim
By organisation
Speech, Music and Hearing, TMH
Natural Language ProcessingOther Engineering and Technologies

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 29 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf