kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Look Who’s Talking: Visual Identification of the Active Speaker in Multi-party Human-robot Interaction
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-0861-8660
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1399-6604
2016 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents analysis of a previously recorded multimodal interaction dataset. The primary purpose of that dataset is to explore patterns in the focus of visual attention of humans under three different conditions - two humans involved in task-based interaction with a robot; the same two humans involved in task-based interaction where the robot is replaced by a third human, and a free three-party human interaction. The paper presents a data-driven methodology for automatic visual identification of the active speaker based on facial action units (AUs). The paper also presents an evaluation of the proposed methodology on 12 different interactions with an approximate length of 4 hours. The methodology will be implemented on a robot and used to generate natural focus of visual attention behavior during multi-party human-robot interactions

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2016. p. 22-27
Keywords [en]
Active speaker identification, Human-robot interaction, Multi-modal interaction
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-193940DOI: 10.1145/3005467.3005470Scopus ID: 2-s2.0-85006754103OAI: oai:DiVA.org:kth-193940DiVA, id: diva2:1034702
Conference
2nd Workshop on Advancements in Social Signal Processing for Multimodal Interaction 2016, ASSP4MI 2016 - Held in conjunction with the 18th ACM International Conference on Multimodal Interaction 2016, ICMI 2016
Note

QCR 20161013

QC 20170314

Available from: 2016-10-12 Created: 2016-10-12 Last updated: 2024-03-18Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopushttp://wwwhome.ewi.utwente.nl/~truongkp/icmi2016-assp4mi/

Authority records

Stefanov, KalinBeskow, Jonas

Search in DiVA

By author/editor
Stefanov, KalinBeskow, Jonas
By organisation
Speech, Music and Hearing, TMH
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 207 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf