Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Hi robot, it's not what you say, it's how you say it
KTH, Skolan för elektroteknik och datavetenskap (EECS), Människocentrerad teknologi, Medieteknik och interaktionsdesign, MID.
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0003-1399-6604
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-0397-6442
Vise andre og tillknytning
2023 (engelsk)Inngår i: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, Institute of Electrical and Electronics Engineers (IEEE) , 2023, s. 307-314Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Many robots use their voice to communicate with people in spoken language but the voices commonly used for robots are often optimized for transactional interactions, rather than social ones. This can limit their ability to create engaging and natural interactions. To address this issue, we designed a spontaneous text-to-speech tool and used it to author natural and spontaneous robot speech. A crowdsourcing evaluation methodology is proposed to compare this type of speech to natural speech and state-of-the-art text-to-speech technology, both in disembodied and embodied form. We created speech samples in a naturalistic setting of people playing tabletop games and conducted a user study evaluating Naturalness, Intelligibility, Social Impression, Prosody, and Perceived Intelligence. The speech samples were chosen to represent three contexts that are common in tabletopgames and the contexts were introduced to the participants that evaluated the speech samples. The study results show that the proposed evaluation methodology allowed for a robust analysis that successfully compared the different conditions. Moreover, the spontaneous voice met our target design goal of being perceived as more natural than a leading commercial text-to-speech.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE) , 2023. s. 307-314
Serie
IEEE RO-MAN, ISSN 1944-9445
Emneord [en]
speech synthesis, human-robot interaction, embodiment, spontaneous speech, intelligibility, naturalness
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-341972DOI: 10.1109/RO-MAN57019.2023.10309427ISI: 001108678600044Scopus ID: 2-s2.0-85186982397OAI: oai:DiVA.org:kth-341972DiVA, id: diva2:1825344
Konferanse
32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA
Merknad

Part of proceedings ISBN 979-8-3503-3670-2

Tilgjengelig fra: 2024-01-09 Laget: 2024-01-09 Sist oppdatert: 2024-03-22bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Person

Miniotaitė, JūraWang, SiyangBeskow, JonasGustafson, JoakimSzékely, ÉvaAbelho Pereira, André Tiago

Søk i DiVA

Av forfatter/redaktør
Miniotaitė, JūraWang, SiyangBeskow, JonasGustafson, JoakimSzékely, ÉvaAbelho Pereira, André Tiago
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 107 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf