Change search
Refine search result
123 101 - 110 of 110
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 101. Schötz, S.
    et al.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Bruce, G.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Segerup, M.
    Simulating Intonation in Regional Varieties of Swedish2010In: Fonetik 2010, Lund, Sweden, 2010Conference paper (Other academic)
  • 102.
    Sibirtseva, Elena
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, perception and learning, RPL.
    Kontogiorgos, Dimosthenis
    KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, perception and learning, RPL.
    Nykvist, Olov
    KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, perception and learning, RPL.
    Karaoguz, Hakan
    KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, perception and learning, RPL.
    Leite, Iolanda
    KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, perception and learning, RPL.
    Gustafson, Joakim
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Kragic, Danica
    KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, perception and learning, RPL.
    A Comparison of Visualisation Methods for Disambiguating Verbal Requests in Human-Robot Interaction2018In: 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2018Conference paper (Refereed)
    Abstract [en]

    Picking up objects requested by a human user is a common task in human-robot interaction. When multiple objects match the user's verbal description, the robot needs to clarify which object the user is referring to before executing the action. Previous research has focused on perceiving user's multimodal behaviour to complement verbal commands or minimising the number of follow up questions to reduce task time. In this paper, we propose a system for reference disambiguation based on visualisation and compare three methods to disambiguate natural language instructions. In a controlled experiment with a YuMi robot, we investigated realtime augmentations of the workspace in three conditions - head-mounted display, projector, and a monitor as the baseline - using objective measures such as time and accuracy, and subjective measures like engagement, immersion, and display interference. Significant differences were found in accuracy and engagement between the conditions, but no differences were found in task time. Despite the higher error rates in the head-mounted display condition, participants found that modality more engaging than the other two, but overall showed preference for the projector condition over the monitor and head-mounted display conditions.

  • 103.
    Skantze, Gabriel
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Furhat at Robotville: A Robot Head Harvesting the Thoughts of the Public through Multi-party Dialogue2012In: Proceedings of the Workshop on Real-time Conversation with Virtual Agents IVA-RCVA, 2012Conference paper (Refereed)
  • 104.
    Skantze, Gabriel
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Attention and interaction control in a human-human-computer dialogue setting2009In: Proceedings of SIGDIAL 2009: the 10th Annual Meeting of the Special Interest Group in Discourse and Dialogue, 2009, p. 310-313Conference paper (Refereed)
    Abstract [en]

    This paper presents a simple, yet effective model for managing attention and interaction control in multimodal spoken dialogue systems. The model allows the user to switch attention between the system and other humans, and the system to stop and resume speaking. An evaluation in a tutoring setting shows that the user’s attention can be effectively monitored using head pose tracking, and that this is a more reliable method than using push-to-talk.

  • 105.
    Skantze, Gabriel
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Multimodal interaction control in the MonAMI Reminder2009In: Proceedings of DiaHolmia: 2009 Workshop on the Semantics and Pragmatics of Dialogue / [ed] Jens Edlund, Joakim Gustafson, Anna Hjalmarsson, Gabriel Skantze, 2009, p. 127-128Conference paper (Refereed)
  • 106.
    Skantze, Gabriel
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Gustafson, Joakim
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Beskow, Jonas
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Multimodal Conversational Interaction with Robots2019In: The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Language Processing, Software, Commercialization, and Emerging Directions / [ed] Sharon Oviatt, Björn Schuller, Philip R. Cohen, Daniel Sonntag, Gerasimos Potamianos, Antonio Krüger, ACM Press, 2019Chapter in book (Refereed)
  • 107. Strangert, E.
    et al.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Improving speaker skill in a resynthesis experiment2008In: Proceedings FONETIK 2008: The XXIst Swedish Phonetics Conference / [ed] Anders Eriksson, Jonas Lindh, 2008, p. 69-72Conference paper (Other academic)
    Abstract [en]

    A synthesis experiment was conducted based ondata from ratings of speaker skill and acousticmeasurements in samples of political speech.Features assumed to be important for “being agood speaker” were manipulated in the sampleof the lowest rated speaker. Increased F0 dynamicsgave the greatest positive effects, butelimination of disfluencies and hesitation pauses,and increased speech rate also played arole for the impression of improved speakerskill.

  • 108. Strangert, E.
    et al.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Subject ratings, acoustic measurements and synthesis of good-speaker characteristics2008In: Proceedings of Interspeech 2008, 2008, p. 1688-1691Conference paper (Refereed)
  • 109. Strangert, E.
    et al.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    What makes a good speaker?: Subject ratings, acoustic measurements and perceptual evaluations2008In: Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH, 2008, p. 1688-1691Conference paper (Refereed)
    Abstract [en]

    This paper deals with subjective qualities and acoustic-prosodic features contributing to the impression of a good speaker. Subjects rated a variety of samples of political speech on a number of subjective qualities and acoustic features were extracted from the speech samples. A perceptual evaluation was also conducted with manipulations of F0 dynamics, fluency and speech rate with the sample of the lowest rated speaker as a basis. Subjects' ranking revealed a clear preference for modified versions over the original with F0 dynamics - a wider range - being the most powerful cue.

  • 110.
    Szekely, Eva
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Mendelson, Joseph
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Synthesising uncertainty: The interplay of vocal effort and hesitation disfluencies2017In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association , 2017, Vol. 2017, p. 804-808Conference paper (Refereed)
    Abstract [en]

    As synthetic voices become more flexible, and conversational systems gain more potential to adapt to the environmental and social situation, the question needs to be examined, how different modifications to the synthetic speech interact with each other and how their specific combinations influence perception. This work investigates how the vocal effort of the synthetic speech together with added disfluencies affect listeners' perception of the degree of uncertainty in an utterance. We introduce a DNN voice built entirely from spontaneous conversational speech data and capable of producing a continuum of vocal efforts, prolongations and filled pauses with a corpus-based method. Results of a listener evaluation indicate that decreased vocal effort, filled pauses and prolongation of function words increase the degree of perceived uncertainty of conversational utterances expressing the speaker's beliefs. We demonstrate that the effect of these three cues are not merely additive, but that interaction effects, in particular between the two types of disfluencies and between vocal effort and prolongations need to be considered when aiming to communicate a specific level of uncertainty. The implications of these findings are relevant for adaptive and incremental conversational systems using expressive speech synthesis and aspiring to communicate the attitude of uncertainty.

123 101 - 110 of 110
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf