kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Generation of speech and facial animation with controllable articulatory effort for amusing conversational characters
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-0397-6442
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1175-840X
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1399-6604
2023 (English)In: 23rd ACM International Conference on Interlligent Virtual Agent (IVA 2023), Institute of Electrical and Electronics Engineers (IEEE) , 2023Conference paper, Published paper (Refereed)
Abstract [en]

Engaging embodied conversational agents need to generate expressive behavior in order to be believable insocializing interactions. We present a system that can generate spontaneous speech with supporting lip movements. The neural conversational TTSvoice is trained on a multi-style speech corpus that has been prosodically tagged (pitch and speaking rate) and transcribed (including tokens for breathing, fillers and laughter). We introduce a speech animation algorithm where articulatory effort can be adjusted. The facial animation is driven by time-stamped phonemes and prominence estimates from the synthesised speech waveform to modulate the lip and jaw movements accordingly. In objective evaluations we show that the system is able to generate speech and facial animation that vary in articulation effort. In subjective evaluations we compare our conversational TTS system’s capability to deliver jokes with a commercial TTS. Both systems succeeded equally good.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2023.
National Category
Natural Language Processing Robotics and automation
Identifiers
URN: urn:nbn:se:kth:diva-341039DOI: 10.1145/3570945.3607289Scopus ID: 2-s2.0-85183581153OAI: oai:DiVA.org:kth-341039DiVA, id: diva2:1820903
Conference
23rd ACM International Conference on Intelligent Virtual Agent (IVA 2023), Würzburg, Germany, Jan 5 2023 - Jan 8 2023
Note

Part of ISBN 9798350345445

QC 20231124

Available from: 2023-12-19 Created: 2023-12-19 Last updated: 2025-02-05Bibliographically approved

Open Access in DiVA

fulltext(10059 kB)180 downloads
File information
File name FULLTEXT01.pdfFile size 10059 kBChecksum SHA-512
57413af67560250a143cb519cad54592d14d91be581ae656f66ea3c52861db2833a9dce56859d1bfdaf66e8e7ffa82e09741556caedd1718f353c7ba1795dc3f
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Gustafsson, JoakimSzékely, ÉvaBeskow, Jonas

Search in DiVA

By author/editor
Gustafsson, JoakimSzékely, ÉvaBeskow, Jonas
By organisation
Speech, Music and Hearing, TMH
Natural Language ProcessingRobotics and automation

Search outside of DiVA

GoogleGoogle Scholar
Total: 181 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 272 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf