Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Automatic annotation of gestural units in spontaneous face-to-face interaction
KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-7801-7617
KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-4628-3769
KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.ORCID-id: 0000-0003-1399-6604
2016 (Engelska)Ingår i: MA3HMI 2016 - Proceedings of the Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, 2016, s. 15-19Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Speech and gesture co-occur in spontaneous dialogue in a highly complex fashion. There is a large variability in the motion that people exhibit during a dialogue, and different kinds of motion occur during different states of the interaction. A wide range of multimodal interface applications, for example in the fields of virtual agents or social robots, can be envisioned where it is important to be able to automatically identify gestures that carry information and discriminate them from other types of motion. While it is easy for a human to distinguish and segment manual gestures from a flow of multimodal information, the same task is not trivial to perform for a machine. In this paper we present a method to automatically segment and label gestural units from a stream of 3D motion capture data. The gestural flow is modeled with a 2-level Hierarchical Hidden Markov Model (HHMM) where the sub-states correspond to gesture phases. The model is trained based on labels of complete gesture units and self-adaptive manipulators. The model is tested and validated on two datasets differing in genre and in method of capturing motion, and outperforms a state-of-the-art SVM classifier on a publicly available dataset.

Ort, förlag, år, upplaga, sidor
2016. s. 15-19
Nyckelord [en]
Gesture recognition, Motion capture, Spontaneous dialogue, Hidden Markov models, Man machine systems, Markov processes, Online systems, 3D motion capture, Automatic annotation, Face-to-face interaction, Hierarchical hidden markov models, Multi-modal information, Multi-modal interfaces, Classification (of information)
Nationell ämneskategori
Robotteknik och automation
Identifikatorer
URN: urn:nbn:se:kth:diva-202135DOI: 10.1145/3011263.3011268Scopus ID: 2-s2.0-85003571594ISBN: 9781450345620 (tryckt)OAI: oai:DiVA.org:kth-202135DiVA, id: diva2:1081313
Konferens
2016 Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, MA3HMI 2016, 12 November 2016 through 16 November 2016
Forskningsfinansiär
Vetenskapsrådet, 2010-4646
Anmärkning

Funding text: The work reported here is carried out within the projects: "Timing of intonation and gestures in spoken communication," (P12-0634:1) funded by the Bank of Sweden Tercentenary Foundation, and "Large-scale massively multimodal modelling of non-verbal behaviour in spontaneous dialogue," (VR 2010-4646) funded by Swedish Research Council.

Tillgänglig från: 2017-03-13 Skapad: 2017-03-13 Senast uppdaterad: 2017-11-24Bibliografiskt granskad
Ingår i avhandling
1. Performance, Processing and Perception of Communicative Motion for Avatars and Agents
Öppna denna publikation i ny flik eller fönster >>Performance, Processing and Perception of Communicative Motion for Avatars and Agents
2017 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Artificial agents and avatars are designed with a large variety of face and body configurations. Some of these (such as virtual characters in films) may be highly realistic and human-like, while others (such as social robots) have considerably more limited expressive means. In both cases, human motion serves as the model and inspiration for the non-verbal behavior displayed. This thesis focuses on increasing the expressive capacities of artificial agents and avatars using two main strategies: 1) improving the automatic capturing of the most communicative areas for human communication, namely the face and the fingers, and 2) increasing communication clarity by proposing novel ways of eliciting clear and readable non-verbal behavior.

The first part of the thesis covers automatic methods for capturing and processing motion data. In paper A, we propose a novel dual sensor method for capturing hands and fingers using optical motion capture in combination with low-cost instrumented gloves. The approach circumvents the main problems with marker-based systems and glove-based systems, and it is demonstrated and evaluated on a key-word signing avatar. In paper B, we propose a robust method for automatic labeling of sparse, non-rigid motion capture marker sets, and we evaluate it on a variety of marker configurations for finger and facial capture. In paper C, we propose an automatic method for annotating hand gestures using Hierarchical Hidden Markov Models (HHMMs).

The second part of the thesis covers studies on creating and evaluating multimodal databases with clear and exaggerated motion. The main idea is that this type of motion is appropriate for agents under certain communicative situations (such as noisy environments) or for agents with reduced expressive degrees of freedom (such as humanoid robots). In paper D, we record motion capture data for a virtual talking head with variable articulation style (normal-to-over articulated). In paper E, we use techniques from mime acting to generate clear non-verbal expressions custom tailored for three agent embodiments (face-and-body, face-only and body-only).

Ort, förlag, år, upplaga, sidor
Stockholm: KTH Royal Institute of Technology, 2017. s. 73
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 24
Nationell ämneskategori
Data- och informationsvetenskap
Forskningsämne
Tal- och musikkommunikation
Identifikatorer
urn:nbn:se:kth:diva-218272 (URN)978-91-7729-608-9 (ISBN)
Disputation
2017-12-15, F3, Lindstedtsvägen 26, Stockholm, 14:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20171127

Tillgänglig från: 2017-11-27 Skapad: 2017-11-24 Senast uppdaterad: 2018-01-13Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Alexanderson, SimonHouse, DavidBeskow, Jonas

Sök vidare i DiVA

Av författaren/redaktören
Alexanderson, SimonHouse, DavidBeskow, Jonas
Av organisationen
Tal, musik och hörsel, TMH
Robotteknik och automation

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 737 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf