kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Recognition and Generation of Communicative Signals: Modeling of Hand Gestures, Speech Activity and Eye-Gaze in Human-Machine Interaction
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Nonverbal communication is essential for natural and effective face-to-face human-human interaction. It is the process of communicating through sending and receiving wordless (mostly visual, but also auditory) signals between people. Consequently, a natural and effective face-to-face human-machine interaction requires machines (e.g., robots) to understand and produce such human-like signals. There are many types of nonverbal signals used in this form of communication including, body postures, hand gestures, facial expressions, eye movements, touches and uses of space. This thesis investigates two of these nonverbal signals: hand gestures and eye-gaze. The main goal of the thesis is to propose computational methods for real-time recognition and generation of these two signals in order to facilitate natural and effective human-machine interaction.

The first topic addressed in the thesis is the real-time recognition of hand gestures and its application to recognition of isolated sign language signs. Hand gestures can also provide important cues during human-robot interaction, for example, emblems are type of hand gestures with specific meaning used to substitute spoken words. The thesis has two main contributions with respect to the recognition of hand gestures: 1) a newly collected dataset of isolated Swedish Sign Language signs, and 2) a real-time hand gestures recognition method.

The second topic addressed in the thesis is the general problem of real-time speech activity detection in noisy and dynamic environments and its application to socially-aware language acquisition. Speech activity can also provide important information during human-robot interaction, for example, the current active speaker's hand gestures and eye-gaze direction or head orientation can play an important role in understanding the state of the interaction. The thesis has one main contribution with respect to speech activity detection: a real-time vision-based speech activity detection method.

The third topic addressed in the thesis is the real-time generation of eye-gaze direction or head orientation and its application to human-robot interaction. Eye-gaze direction or head orientation can provide important cues during human-robot interaction, for example, it can regulate who is allowed to speak when and coordinate the changes in the roles on the conversational floor (e.g., speaker, addressee, and bystander). The thesis has two main contributions with respect to the generation of eye-gaze direction or head orientation: 1) a newly collected dataset of face-to-face interactions, and 2) a real-time eye-gaze direction or head orientation generation method.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2018. , p. 54
Series
TRITA-EECS-AVL ; 2018:46
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-227986ISBN: 978-91-7729-810-6 (print)OAI: oai:DiVA.org:kth-227986DiVA, id: diva2:1206166
Public defence
2018-06-07, Hörsal K2, Teknikringen 28, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20180516

Available from: 2018-05-16 Created: 2018-05-16 Last updated: 2022-06-26Bibliographically approved
List of papers
1. A Kinect Corpus of Swedish Sign Language Signs
Open this publication in new window or tab >>A Kinect Corpus of Swedish Sign Language Signs
2013 (English)In: Proceedings of the 2013 Workshop on Multimodal Corpora: Beyond Audio and Video, 2013Conference paper, Published paper (Refereed)
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-137412 (URN)
Conference
Multimodal Corpora: Beyond Audio and Video, Edinburgh, UK, 2013
Note

QC 20161013

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2022-06-23Bibliographically approved
2. A Real-time Gesture Recognition System for Isolated Swedish Sign Language Signs
Open this publication in new window or tab >>A Real-time Gesture Recognition System for Isolated Swedish Sign Language Signs
2017 (English)In: Proceedings of the 4th European and 7th Nordic Symposium on Multimodal Communication (MMSYM 2016), Linköping University Electronic Press , 2017Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Linköping University Electronic Press, 2017
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-218328 (URN)
Conference
4th European and 7th Nordic Symposium on Multimodal Communication. (MMSYM 2016), Copenhagen, 29-30 September 2016
Note

QC 20171128

Available from: 2017-11-27 Created: 2017-11-27 Last updated: 2024-03-18Bibliographically approved
3. Vision-based Active Speaker Detection in Multiparty Interaction
Open this publication in new window or tab >>Vision-based Active Speaker Detection in Multiparty Interaction
2017 (English)In: Grounding Language Understanding, 2017Conference paper, Published paper (Refereed)
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-211651 (URN)10.21437/GLU.2017-10 (DOI)
Conference
Grounding Language Understanding GLU2017 August 25, 2017, KTH Royal Institute of Technology, Stockholm, Sweden
Note

QC 20170809

Available from: 2017-08-08 Created: 2017-08-08 Last updated: 2025-02-07Bibliographically approved
4. Self-Supervised Vision-Based Detection of the Active Speaker as a Prerequisite for Socially-Aware Language Acquisition
Open this publication in new window or tab >>Self-Supervised Vision-Based Detection of the Active Speaker as a Prerequisite for Socially-Aware Language Acquisition
(English)Manuscript (preprint) (Other academic)
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-227980 (URN)
Note

QC 20180516

Available from: 2018-05-16 Created: 2018-05-16 Last updated: 2022-06-26Bibliographically approved
5. A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction
Open this publication in new window or tab >>A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction
2016 (English)In: Proceedings of the 10th edition of the Language Resources and Evaluation Conference, ELRA , 2016Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
ELRA, 2016
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-187954 (URN)000526952504105 ()2-s2.0-85016436223 (Scopus ID)
Conference
Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016, 23-28 of May.
Note

QC 20211018

Available from: 2016-06-02 Created: 2016-06-02 Last updated: 2022-06-22Bibliographically approved
6. Analysis and Generation of Candidate Gaze Targets in Multiparty Open-World Dialogues
Open this publication in new window or tab >>Analysis and Generation of Candidate Gaze Targets in Multiparty Open-World Dialogues
Show others...
(English)Manuscript (preprint) (Other academic)
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-227982 (URN)
Note

QC 20180516

Available from: 2018-05-16 Created: 2018-05-16 Last updated: 2022-06-26Bibliographically approved

Open Access in DiVA

fulltext(481 kB)1062 downloads
File information
File name FULLTEXT01.pdfFile size 481 kBChecksum SHA-512
99e48abc97e8aabceb5672c647f0df39a795356520a399d538691b1016aabdffd3cca71d709048048b138599288ffdcb0832477946db977e3a5e1f99a7cef26c
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Stefanov, Kalin
By organisation
Speech, Music and Hearing, TMH
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1062 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1933 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf