kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Hi robot, it's not what you say, it's how you say it
KTH, School of Electrical Engineering and Computer Science (EECS), Human Centered Technology, Media Technology and Interaction Design, MID.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1399-6604
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-0397-6442
Show others and affiliations
2023 (English)In: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 307-314Conference paper, Published paper (Refereed)
Abstract [en]

Many robots use their voice to communicate with people in spoken language but the voices commonly used for robots are often optimized for transactional interactions, rather than social ones. This can limit their ability to create engaging and natural interactions. To address this issue, we designed a spontaneous text-to-speech tool and used it to author natural and spontaneous robot speech. A crowdsourcing evaluation methodology is proposed to compare this type of speech to natural speech and state-of-the-art text-to-speech technology, both in disembodied and embodied form. We created speech samples in a naturalistic setting of people playing tabletop games and conducted a user study evaluating Naturalness, Intelligibility, Social Impression, Prosody, and Perceived Intelligence. The speech samples were chosen to represent three contexts that are common in tabletopgames and the contexts were introduced to the participants that evaluated the speech samples. The study results show that the proposed evaluation methodology allowed for a robust analysis that successfully compared the different conditions. Moreover, the spontaneous voice met our target design goal of being perceived as more natural than a leading commercial text-to-speech.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2023. p. 307-314
Series
IEEE RO-MAN, ISSN 1944-9445
Keywords [en]
speech synthesis, human-robot interaction, embodiment, spontaneous speech, intelligibility, naturalness
National Category
Interaction Technologies
Identifiers
URN: urn:nbn:se:kth:diva-341972DOI: 10.1109/RO-MAN57019.2023.10309427ISI: 001108678600044Scopus ID: 2-s2.0-85186982397OAI: oai:DiVA.org:kth-341972DiVA, id: diva2:1825344
Conference
32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA
Note

Part of proceedings ISBN 979-8-3503-3670-2

Available from: 2024-01-09 Created: 2024-01-09 Last updated: 2024-03-22Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Miniotaitė, JūraWang, SiyangBeskow, JonasGustafson, JoakimSzékely, ÉvaAbelho Pereira, André Tiago

Search in DiVA

By author/editor
Miniotaitė, JūraWang, SiyangBeskow, JonasGustafson, JoakimSzékely, ÉvaAbelho Pereira, André Tiago
By organisation
Media Technology and Interaction Design, MIDSpeech, Music and Hearing, TMH
Interaction Technologies

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 83 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf