Endre søk
Link to record
Permanent link

Direct link
Abelho Pereira, André TiagoORCID iD iconorcid.org/0000-0003-2428-0468
Publikasjoner (10 av 30) Visa alla publikasjoner
Leite, I., Ahlberg, W., Pereira, A., Sestini, A., Gisslen, L. & Tollmar, K. (2025). A Call for Deeper Collaboration Between Robotics and Game Development. In: Proceedings of the IEEE 2025 Conference on Games, CoG 2025: . Paper presented at 2025 IEEE Conference on Games, CoG 2025, Lisbon, Portugal, August 26-29, 2025. Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>A Call for Deeper Collaboration Between Robotics and Game Development
Vise andre…
2025 (engelsk)Inngår i: Proceedings of the IEEE 2025 Conference on Games, CoG 2025, Institute of Electrical and Electronics Engineers (IEEE) , 2025Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

While robotics and game development have independently achieved significant progress in creating interactive and intelligent systems, a deeper collaboration between these fields could be mutually beneficial. This paper argues for more collaboration, highlighting current limited interactions and proposing directions for future research. We discuss shared foundations such as Artificial Intelligence, Extended Reality, and the increasing use of common tools and standards. We then propose opportunities where game development methodologies can advance robotics (e.g., gamified data collection and richer simulation environments) and where robotics research can contribute to games (e.g., improved NPC autonomy and embodied intelligence). This cross-disciplinary interaction can accelerate innovation and lead to more intelligent and usercentered technologies in both domains.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2025
Emneord
artificial intelligence, collaboration, game development, non-player characters, Robotics
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-370815 (URN)10.1109/CoG64752.2025.11114209 (DOI)2-s2.0-105015576103 (Scopus ID)
Konferanse
2025 IEEE Conference on Games, CoG 2025, Lisbon, Portugal, August 26-29, 2025
Merknad

Part of ISBN 9798331589042

QC 20251003

Tilgjengelig fra: 2025-10-03 Laget: 2025-10-03 Sist oppdatert: 2025-10-03bibliografisk kontrollert
Irfan, B., Miniota, J., Thunberg, S., Lagerstedt, E., Kuoppamäki, S., Skantze, G. & Abelho Pereira, A. T. (2025). Human-Robot Interaction Conversational User Enjoyment Scale (HRI CUES). IEEE Transactions on Affective Computing
Åpne denne publikasjonen i ny fane eller vindu >>Human-Robot Interaction Conversational User Enjoyment Scale (HRI CUES)
Vise andre…
2025 (engelsk)Inngår i: IEEE Transactions on Affective Computing, E-ISSN 1949-3045Artikkel i tidsskrift (Fagfellevurdert) Epub ahead of print
Abstract [en]

Understanding user enjoyment is crucial in human-robot interaction (HRI), as it can impact interaction quality and influence user acceptance and long-term engagement with robots, particularly in the context of conversations with social robots. However, current assessment methods rely solely on self-reported questionnaires, failing to capture interaction dynamics. This work introduces the Human-Robot Interaction Conversational User Enjoyment Scale (HRI CUES), a novel 5-point scale to assess user enjoyment from an external perspective (e.g. by an annotator) for conversations with a robot. The scale was developed through rigorous evaluations and discussions among three annotators with relevant expertise, using open-domain conversations with a companion robot that was powered by a large language model, and was applied to each conversation exchange (i.e. a robot-participant turn pair) alongside overall interaction. It was evaluated on 25 older adults' interactions with the companion robot, corresponding to 174 minutes of data, showing moderate to good alignment between annotators. Although the scale was developed and tested in the context of older adult interactions with a robot, its basis in general and non-task-specific indicators of enjoyment supports its broader applicability. The study further offers insights into understanding the nuances and challenges of assessing user enjoyment in robot interactions, and provides guidelines on applying the scale to other domains and populations. The dataset is available online.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2025
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-374884 (URN)10.1109/TAFFC.2025.3590359 (DOI)2-s2.0-105011494748 (Scopus ID)
Merknad

QC 20260107

Tilgjengelig fra: 2026-01-06 Laget: 2026-01-06 Sist oppdatert: 2026-01-07bibliografisk kontrollert
Torubarova, E., Arvidsson, C., Berrebi, J., Uddén, J. & Abelho Pereira, A. T. (2025). NeuroEngage: A Multimodal Dataset Integrating fMRI for Analyzing Conversational Engagement in Human-Human and Human-Robot Interactions. In: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at 20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, Mar 4 2025 - Mar 6 2025 (pp. 849-858). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>NeuroEngage: A Multimodal Dataset Integrating fMRI for Analyzing Conversational Engagement in Human-Human and Human-Robot Interactions
Vise andre…
2025 (engelsk)Inngår i: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, Institute of Electrical and Electronics Engineers (IEEE) , 2025, s. 849-858Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This study aimed to deepen our understanding of the behavioral and neurocognitive processes involved in human-human and human-robot communication in a more ecologically valid setting compared to the traditional neurolinguistic paradigms. We collected a novel open-source dataset (N=30 for human-human and N=20 for human-robot interactions), that includes fMRI, eye-tracking, segmented audio, video, and behavioral data, resulting in 30 minutes of free conversations per participant. To enable unrestricted, spontaneous robot behavior, we employed a novel VR-mediated teleoperation system. Our mixed design allowed us to compare participants' perception of humans and robots across three within-subject conditions of conversational engagement: Engaged Communicator, Active Listener, and Passive Listener. We provide an open-access dataset, replicable code for the teleoperation system, and an initial analysis of fMRI, behavioral, and speech data. We observed distinct neural profiles: speaking to the human agent recruited more higher-level frontal regions associated with socio-pragmatic processes, while listening to the robot recruited more sensory areas, including auditory and visual regions. Engagement levels and agent types also affected speech and behavioral patterns, offering valuable insights into conversational dynamics in human-human and human-robot interactions.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2025
Emneord
conversation, dataset, engagement, fMRI, human-robot interaction, neuroimaging
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-363755 (URN)10.1109/HRI61500.2025.10974251 (DOI)2-s2.0-105004876905 (Scopus ID)
Konferanse
20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, Mar 4 2025 - Mar 6 2025
Merknad

QC 20250527

Tilgjengelig fra: 2025-05-21 Laget: 2025-05-21 Sist oppdatert: 2025-05-27bibliografisk kontrollert
Janssens, R., Pereira, A., Skantze, G., Irfan, B. & Belpaeme, T. (2025). Online Prediction of User Enjoyment in Human-Robot Dialogue with LLMs. In: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at 20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, March 4-6, 2025 (pp. 1363-1367). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>Online Prediction of User Enjoyment in Human-Robot Dialogue with LLMs
Vise andre…
2025 (engelsk)Inngår i: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, Institute of Electrical and Electronics Engineers (IEEE) , 2025, s. 1363-1367Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Large Language Models (LLMs) allow social robots to engage in unconstrained open-domain dialogue, but often make mistakes when employed in real-world interactions, requiring adaptation of LLMs to specific conversational contexts. However, LLM adaptation techniques require a feedback signal, ideally for multiple alternative utterances. At the same time, human-robot dialogue data is scarce and research often relies on external annotators. A tool for automatic prediction of user enjoyment in human-robot dialogue is therefore needed. We investigate the possibility of predicting user enjoyment turn-by-turn using an LLM, giving it a proposed robot utterance within the dialogue context, but without access to user response. We compare this performance to the system's enjoyment ratings when user responses are available and to assessments by expert human annotators, in addition to self-reported user perceptions. We evaluate the proposed LLM predictor in a human-robot interaction (HRI) dataset with conversation transcripts of 25 older adults' 7-minute dialogues with a companion robot. Our results show that an LLM is capable of predicting user enjoyment, without loss of performance despite the lack of user response and even achieving performance similar to that of human expert annotators. Furthermore, results show that the system surpasses expert annotators in its correlation with the user's self-reported perceptions of the conversation. This work presents a tool to remove the reliance on external annotators for enjoyment evaluation and paves the way toward real-time adaptation in human-robot dialogue.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2025
Emneord
human-robot interaction, large language model, open-domain dialogue, prediction, user enjoyment
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-363754 (URN)10.1109/HRI61500.2025.10973944 (DOI)2-s2.0-105004873166 (Scopus ID)
Konferanse
20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, March 4-6, 2025
Merknad

Part of ISBN 9798350378931

QC 20250525

Tilgjengelig fra: 2025-05-21 Laget: 2025-05-21 Sist oppdatert: 2025-05-25bibliografisk kontrollert
Marcinek, L., Irfan, B., Skantze, G., Abelho Pereira, A. T. & Gustafsson, J. (2025). Role of Reasoning in LLM Enjoyment Detection: Evaluation Across Conversational Levels for Human-Robot Interaction. In: Frédéric Béchet, Fabrice Lefèvre, Nicholas Asher, Seokhwan Kim, Teva Merlin (Ed.), Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue: . Paper presented at The 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Avignon, France, Aug 25-27, 2025 (pp. 573-590). SIGDIAL
Åpne denne publikasjonen i ny fane eller vindu >>Role of Reasoning in LLM Enjoyment Detection: Evaluation Across Conversational Levels for Human-Robot Interaction
Vise andre…
2025 (engelsk)Inngår i: Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue / [ed] Frédéric Béchet, Fabrice Lefèvre, Nicholas Asher, Seokhwan Kim, Teva Merlin, SIGDIAL , 2025, s. 573-590Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

User enjoyment is central to developing conversational AI systems that can recover from failures and maintain interest over time. However, existing approaches often struggle to detect subtle cues that reflect user experience. Large Language Models (LLMs) with reasoning capabilities have outperformed standard models on various other tasks, suggesting potential benefits for enjoyment detection. This study investigates whether models with reasoning capabilities outperform standard models when assessing enjoyment in a human-robot dialogue corpus at both turn and interaction levels. Results indicate that reasoning capabilities have complex, model-dependent effects rather than universal benefits. While performance was nearly identical at the interaction level (0.44 vs 0.43), reasoning models substantially outperformed at the turn level (0.42 vs 0.36). Notably, LLMs correlated better with users’ self-reported enjoyment metrics than human annotators, despite achieving lower accuracy against human consensus ratings. Analysis revealed distinctive error patterns: non-reasoning models showed bias toward positive ratings at the turn level, while both model types exhibited central tendency bias at the interaction level. These findings suggest that reasoning should be applied selectively based on model architecture and assessment context, with assessment granularity significantly influencing relative effectiveness.

sted, utgiver, år, opplag, sider
SIGDIAL, 2025
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-374881 (URN)
Konferanse
The 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Avignon, France, Aug 25-27, 2025
Merknad

Part of ISBN 979-8-89176-329-6

QC 20260107

Tilgjengelig fra: 2026-01-06 Laget: 2026-01-06 Sist oppdatert: 2026-01-07bibliografisk kontrollert
Santana, R., Irfan, B., Lagerstedt, E., Skantze, G. & Abelho Pereira, A. T. (2025). Speech-to-Joy: Self-Supervised Features for Enjoyment Prediction in Human-Robot Conversation. In: Ram Subramanian, Yukiko I. Nakano, Tom Gedeon, Mohan Kankanhalli, Tanaya Guha, Jainendra Shukla, Gelareh Mohammadi, Oya Celiktutan (Ed.), Proceedings of the 27th International Conference on Multimodal Interaction, ICMI 2025: . Paper presented at The 27th International Conference on Multimodal Interaction, ICMI 2025, Canberra, Australia, October 13-17, 2025 (pp. 238-248). Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>Speech-to-Joy: Self-Supervised Features for Enjoyment Prediction in Human-Robot Conversation
Vise andre…
2025 (engelsk)Inngår i: Proceedings of the 27th International Conference on Multimodal Interaction, ICMI 2025 / [ed] Ram Subramanian, Yukiko I. Nakano, Tom Gedeon, Mohan Kankanhalli, Tanaya Guha, Jainendra Shukla, Gelareh Mohammadi, Oya Celiktutan, Association for Computing Machinery (ACM) , 2025, s. 238-248Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Conversational systems that interact or collaborate with people must understand not only task success but also the quality of human experience. We present Speech-to-Joy, a lightweight framework that learns to predict users’ own post-interaction enjoyment ratings using latent embeddings from audio and text modalities. Evaluated on a corpus of human-robot dialogues, the model’s predicted enjoyment correlates strongly and significantly with user self-reports, outperforming both an experienced HRI annotator and heavier LLM-based uni- and multimodal baselines. Notably, even the unimodal audio branch - using only frozen speech embeddings - surpasses all baselines, and a late-fusion of text and audio achieves the highest performance. Designed for real-time inference on resource-limited platforms, Speech-to-Joy replaces ad-hoc emotion heuristics with a direct and user-centered measure of enjoyment. This work paves the way for optimizing interactions with robots and other conversational systems through the lens that matters most: the user’s own experience.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2025
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-374889 (URN)10.1145/3716553.3750747 (DOI)2-s2.0-105022238812 (Scopus ID)
Konferanse
The 27th International Conference on Multimodal Interaction, ICMI 2025, Canberra, Australia, October 13-17, 2025
Merknad

Part of ISBN 979-8-4007-1499-3

QC 20260107

Tilgjengelig fra: 2026-01-06 Laget: 2026-01-06 Sist oppdatert: 2026-01-07bibliografisk kontrollert
Arvidsson, C., Torubarova, E., Abelho Pereira, A. T. & Udden, J. (2024). Conversational production and comprehension: fMRI-evidence reminiscent of but deviant from the classical Broca-Wernicke model. Cerebral Cortex, 34(3), Article ID bhae073.
Åpne denne publikasjonen i ny fane eller vindu >>Conversational production and comprehension: fMRI-evidence reminiscent of but deviant from the classical Broca-Wernicke model
2024 (engelsk)Inngår i: Cerebral Cortex, ISSN 1047-3211, E-ISSN 1460-2199, Vol. 34, nr 3, artikkel-id bhae073Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

A key question in research on the neurobiology of language is to which extent the language production and comprehension systems share neural infrastructure, but this question has not been addressed in the context of conversation. We utilized a public fMRI dataset where 24 participants engaged in unscripted conversations with a confederate outside the scanner, via an audio-video link. We provide evidence indicating that the two systems share neural infrastructure in the left-lateralized perisylvian language network, but diverge regarding the level of activation in regions within the network. Activity in the left inferior frontal gyrus was stronger in production compared to comprehension, while comprehension showed stronger recruitment of the left anterior middle temporal gyrus and superior temporal sulcus, compared to production. Although our results are reminiscent of the classical Broca-Wernicke model, the anterior (rather than posterior) temporal activation is a notable difference from that model. This is one of the findings that may be a consequence of the conversational setting, another being that conversational production activated what we interpret as higher-level socio-pragmatic processes. In conclusion, we present evidence for partial overlap and functional asymmetry of the neural infrastructure of production and comprehension, in the above-mentioned frontal vs temporal regions during conversation.

sted, utgiver, år, opplag, sider
Oxford University Press (OUP), 2024
Emneord
interaction, contextual language processing, LIFG, LMTG, functional asymmetry
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-351444 (URN)10.1093/cercor/bhae073 (DOI)001273703700001 ()38501383 (PubMedID)2-s2.0-85188194135 (Scopus ID)
Merknad

QC 20240815

Tilgjengelig fra: 2024-08-15 Laget: 2024-08-15 Sist oppdatert: 2024-08-15bibliografisk kontrollert
Abelho Pereira, A. T., Marcinek, L., Miniotaitė, J., Thunberg, S., Lagerstedt, E., Gustafsson, J., . . . Irfan, B. (2024). Multimodal User Enjoyment Detection in Human-Robot Conversation: The Power of Large Language Models. In: : . Paper presented at 26th International Conference on Multimodal Interaction (ICMI), San Jose, USA, November 4-8, 2024 (pp. 469-478). Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>Multimodal User Enjoyment Detection in Human-Robot Conversation: The Power of Large Language Models
Vise andre…
2024 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Enjoyment is a crucial yet complex indicator of positive user experience in Human-Robot Interaction (HRI). While manual enjoyment annotation is feasible, developing reliable automatic detection methods remains a challenge. This paper investigates a multimodal approach to automatic enjoyment annotation for HRI conversations, leveraging large language models (LLMs), visual, audio, and temporal cues. Our findings demonstrate that both text-only and multimodal LLMs with carefully designed prompts can achieve performance comparable to human annotators in detecting user enjoyment. Furthermore, results reveal a stronger alignment between LLM-based annotations and user self-reports of enjoyment compared to human annotators. While multimodal supervised learning techniques did not improve all of our performance metrics, they could successfully replicate human annotators and highlighted the importance of visual and audio cues in detecting subtle shifts in enjoyment. This research demonstrates the potential of LLMs for real-time enjoyment detection, paving the way for adaptive companion robots that can dynamically enhance user experiences.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2024
Emneord
Afect Recognition, Human-Robot Interaction, Large Language Models, Multimodal, Older Adults, User Enjoyment
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-359146 (URN)10.1145/3678957.3685729 (DOI)001433669800051 ()2-s2.0-85212589337 (Scopus ID)
Konferanse
26th International Conference on Multimodal Interaction (ICMI), San Jose, USA, November 4-8, 2024
Merknad

QC 20250127

Tilgjengelig fra: 2025-01-27 Laget: 2025-01-27 Sist oppdatert: 2025-04-30bibliografisk kontrollert
Wozniak, M. K., Stower, R., Jensfelt, P. & Abelho Pereira, A. T. (2023). Happily Error After: Framework Development and User Study for Correcting Robot Perception Errors in Virtual Reality. In: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN: . Paper presented at 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA (pp. 1573-1580). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>Happily Error After: Framework Development and User Study for Correcting Robot Perception Errors in Virtual Reality
2023 (engelsk)Inngår i: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, Institute of Electrical and Electronics Engineers (IEEE) , 2023, s. 1573-1580Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

While we can see robots in more areas of our lives, they still make errors. One common cause of failure stems from the robot perception module when detecting objects. Allowing users to correct such errors can help improve the interaction and prevent the same errors in the future. Consequently, we investigate the effectiveness of a virtual reality (VR) framework for correcting perception errors of a Franka Panda robot. We conducted a user study with 56 participants who interacted with the robot using both VR and screen interfaces. Participants learned to collaborate with the robot faster in the VR interface compared to the screen interface. Additionally, participants found the VR interface more immersive, enjoyable, and expressed a preference for using it again. These findings suggest that VR interfaces may offer advantages over screen interfaces for human-robot interaction in erroneous environments.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2023
Serie
IEEE RO-MAN, ISSN 1944-9445
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-341975 (URN)10.1109/RO-MAN57019.2023.10309446 (DOI)001108678600198 ()2-s2.0-85186968933 (Scopus ID)
Konferanse
32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA
Merknad

Part of proceedings ISBN 979-8-3503-3670-2

QC 20240110

Tilgjengelig fra: 2024-01-10 Laget: 2024-01-10 Sist oppdatert: 2025-02-09bibliografisk kontrollert
Miniotaitė, J., Wang, S., Beskow, J., Gustafson, J., Székely, É. & Abelho Pereira, A. T. (2023). Hi robot, it's not what you say, it's how you say it. In: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN: . Paper presented at 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA (pp. 307-314). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>Hi robot, it's not what you say, it's how you say it
Vise andre…
2023 (engelsk)Inngår i: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, Institute of Electrical and Electronics Engineers (IEEE) , 2023, s. 307-314Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Many robots use their voice to communicate with people in spoken language but the voices commonly used for robots are often optimized for transactional interactions, rather than social ones. This can limit their ability to create engaging and natural interactions. To address this issue, we designed a spontaneous text-to-speech tool and used it to author natural and spontaneous robot speech. A crowdsourcing evaluation methodology is proposed to compare this type of speech to natural speech and state-of-the-art text-to-speech technology, both in disembodied and embodied form. We created speech samples in a naturalistic setting of people playing tabletop games and conducted a user study evaluating Naturalness, Intelligibility, Social Impression, Prosody, and Perceived Intelligence. The speech samples were chosen to represent three contexts that are common in tabletopgames and the contexts were introduced to the participants that evaluated the speech samples. The study results show that the proposed evaluation methodology allowed for a robust analysis that successfully compared the different conditions. Moreover, the spontaneous voice met our target design goal of being perceived as more natural than a leading commercial text-to-speech.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2023
Serie
IEEE RO-MAN, ISSN 1944-9445
Emneord
speech synthesis, human-robot interaction, embodiment, spontaneous speech, intelligibility, naturalness
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-341972 (URN)10.1109/RO-MAN57019.2023.10309427 (DOI)001108678600044 ()2-s2.0-85186982397 (Scopus ID)
Konferanse
32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA
Merknad

Part of proceedings ISBN 979-8-3503-3670-2

Tilgjengelig fra: 2024-01-09 Laget: 2024-01-09 Sist oppdatert: 2025-02-18bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0003-2428-0468