Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 81) Show all publications
Alexanderson, S., House, D. & Beskow, J. (2016). Automatic annotation of gestural units in spontaneous face-to-face interaction. In: MA3HMI 2016 - Proceedings of the Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction: . Paper presented at 2016 Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, MA3HMI 2016, 12 November 2016 through 16 November 2016 (pp. 15-19). .
Open this publication in new window or tab >>Automatic annotation of gestural units in spontaneous face-to-face interaction
2016 (English)In: MA3HMI 2016 - Proceedings of the Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, 2016, 15-19 p.Conference paper, Published paper (Refereed)
Abstract [en]

Speech and gesture co-occur in spontaneous dialogue in a highly complex fashion. There is a large variability in the motion that people exhibit during a dialogue, and different kinds of motion occur during different states of the interaction. A wide range of multimodal interface applications, for example in the fields of virtual agents or social robots, can be envisioned where it is important to be able to automatically identify gestures that carry information and discriminate them from other types of motion. While it is easy for a human to distinguish and segment manual gestures from a flow of multimodal information, the same task is not trivial to perform for a machine. In this paper we present a method to automatically segment and label gestural units from a stream of 3D motion capture data. The gestural flow is modeled with a 2-level Hierarchical Hidden Markov Model (HHMM) where the sub-states correspond to gesture phases. The model is trained based on labels of complete gesture units and self-adaptive manipulators. The model is tested and validated on two datasets differing in genre and in method of capturing motion, and outperforms a state-of-the-art SVM classifier on a publicly available dataset.

Keyword
Gesture recognition, Motion capture, Spontaneous dialogue, Hidden Markov models, Man machine systems, Markov processes, Online systems, 3D motion capture, Automatic annotation, Face-to-face interaction, Hierarchical hidden markov models, Multi-modal information, Multi-modal interfaces, Classification (of information)
National Category
Robotics
Identifiers
urn:nbn:se:kth:diva-202135 (URN)10.1145/3011263.3011268 (DOI)2-s2.0-85003571594 (Scopus ID)9781450345620 (ISBN)
Conference
2016 Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, MA3HMI 2016, 12 November 2016 through 16 November 2016
Funder
Swedish Research Council, 2010-4646
Note

Funding text: The work reported here is carried out within the projects: "Timing of intonation and gestures in spoken communication," (P12-0634:1) funded by the Bank of Sweden Tercentenary Foundation, and "Large-scale massively multimodal modelling of non-verbal behaviour in spontaneous dialogue," (VR 2010-4646) funded by Swedish Research Council.

Available from: 2017-03-13 Created: 2017-03-13 Last updated: 2017-11-24Bibliographically approved
Zellers, M., House, D. & Alexanderson, S. (2016). Prosody and hand gesture at turn boundaries in Swedish. In: Proceedings of the International Conference on Speech Prosody: . Paper presented at 8th Speech Prosody 2016, 31 May 2016 through 3 June 2016 (pp. 831-835). International Speech Communications Association.
Open this publication in new window or tab >>Prosody and hand gesture at turn boundaries in Swedish
2016 (English)In: Proceedings of the International Conference on Speech Prosody, International Speech Communications Association , 2016, 831-835 p.Conference paper, Published paper (Refereed)
Abstract [en]

In order to ensure smooth turn-taking between conversational participants, interlocutors must have ways of providing information to one another about whether they have finished speaking or intend to continue. The current work investigates Swedish speakers’ use of hand gestures in conjunction with turn change or turn hold in unrestricted, spontaneous speech. As has been reported by other researchers, we find that speakers’ gestures end before the end of speech in cases of turn change, while they may extend well beyond the end of a given speech chunk in the case of turn hold. We investigate the degree to which prosodic cues and gesture cues to turn transition in Swedish face-to-face conversation are complementary versus functioning additively. The co-occurrence of acoustic prosodic features and gesture at potential turn boundaries gives strong support for considering hand gestures as part of the prosodic system, particularly in the context of discourse-level information such as maintaining smooth turn transition.

Place, publisher, year, edition, pages
International Speech Communications Association, 2016
Keyword
Gesture, Multimodal communication, Swedish, Turn transition, Co-occurrence, Face-to-face conversation, Multimodal communications, Prosodic features, Smooth turn-taking, Spontaneous speech, Swedishs, Speech
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-195492 (URN)2-s2.0-84982980451 (Scopus ID)
Conference
8th Speech Prosody 2016, 31 May 2016 through 3 June 2016
Note

QC 20161125

Available from: 2016-11-25 Created: 2016-11-03 Last updated: 2018-01-13Bibliographically approved
Strömbergsson, S., Salvi, G. & House, D. (2015). Acoustic and perceptual evaluation of category goodness of /t/ and /k/ in typical and misarticulated children's speech. Journal of the Acoustical Society of America, 137(6), 3422-3435.
Open this publication in new window or tab >>Acoustic and perceptual evaluation of category goodness of /t/ and /k/ in typical and misarticulated children's speech
2015 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 137, no 6, 3422-3435 p.Article in journal (Refereed) Published
Abstract [en]

This investigation explores perceptual and acoustic characteristics of children's successful and unsuccessful productions of /t/ and /k/, with a specific aim of exploring perceptual sensitivity to phonetic detail, and the extent to which this sensitivity is reflected in the acoustic domain. Recordings were collected from 4- to 8-year-old children with a speech sound disorder (SSD) who misarticulated one of the target plosives, and compared to productions recorded from peers with typical speech development (TD). Perceptual responses were registered with regards to a visual-analog scale, ranging from "clear [t]" to "clear [k]." Statistical models of prototypical productions were built, based on spectral moments and discrete cosine transform features, and used in the scoring of SSD productions. In the perceptual evaluation, " clear substitutions" were rated as less prototypical than correct productions. Moreover, target-appropriate productions of /t/ and /k/ produced by children with SSD were rated as less prototypical than those produced by TD peers. The acoustical modeling could to a large extent discriminate between the gross categories /t/ and /k/, and scored the SSD utterances on a continuous scale that was largely consistent with the category of production. However, none of the methods exhibited the same sensitivity to phonetic detail as the human listeners.

National Category
Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-171155 (URN)10.1121/1.4921033 (DOI)000356622400057 ()26093431 (PubMedID)2-s2.0-84935019965 (Scopus ID)
Note

QC 20150720

Available from: 2015-07-20 Created: 2015-07-20 Last updated: 2017-12-04Bibliographically approved
Artman, H., House, D. & Hulten, M. (2015). Designed by Engineers: An analysis of interactionaries with engineering students. Designs for Learning, 7(2), 28-56, Article ID 10.2478/dfl-2014-0062.
Open this publication in new window or tab >>Designed by Engineers: An analysis of interactionaries with engineering students
2015 (English)In: Designs for Learning, ISSN 1654-7608, Vol. 7, no 2, 28-56 p., 10.2478/dfl-2014-0062Article in journal (Refereed) Published
Abstract [en]

The aim of this study is to describe and analyze learning taking place in a collaborative design exercise involving engineering students. The students perform a time-constrained, open-ended, complex interaction design task, an “interactionary”. A multimodal learning perspective is used. We have performed detailed analyses of video recordings of the engineering students, including classifying aspects of interaction. Our results show that the engineering students carry out and articulate their design work using a technology-centred approach and focus more on the function of their designs than on aspects of interaction. The engineering students mainly make use of ephemeral communication strategies (gestures and speech) rather than sketching in physical materials. We conclude that the interactionary may be an educational format that can help engineering students learn the messiness of design work. We further identify several constraints to the engineering students’ design learning and propose useful interventions that a teacher could make during an interactionary. We especially emphasize interventions that help engineering students retain aspects of human-centered design throughout the design process. This study partially replicates a previous study which involved interaction design students.

Place, publisher, year, edition, pages
De Gruyter Open, 2015
Keyword
design, engineering education, interactionary, interaction design, learning design sequence, multimodal learning
National Category
Learning Human Aspects of ICT Communication Studies
Research subject
Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-164951 (URN)10.2478/dfl-2014-0062 (DOI)
Note

QC 20150424

Available from: 2015-04-21 Created: 2015-04-21 Last updated: 2017-12-04Bibliographically approved
Ambrazaitis, G., Svensson Lundmark, M. & House, D. (2015). Head beats and eyebrow movements as a function of phonological prominence levels and word accents in Stockholm Swedish news broadcasts. In: The 3rd European Symposium on Multimodal Communication: . Paper presented at The 3rd European Symposium on Multimodal Communication, Dublin on 17, 18 September 2015. Dublin, Ireland.
Open this publication in new window or tab >>Head beats and eyebrow movements as a function of phonological prominence levels and word accents in Stockholm Swedish news broadcasts
2015 (English)In: The 3rd European Symposium on Multimodal Communication, Dublin, Ireland, 2015Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Dublin, Ireland: , 2015
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180421 (URN)
Conference
The 3rd European Symposium on Multimodal Communication, Dublin on 17, 18 September 2015
Note

QC 20160216

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
Ambrazaitis, G., Svensson Lundmark, M. & House, D. (2015). Head Movements, Eyebrows, and Phonological Prosodic Prominence Levels in Stockholm. In: 13th International Conference on Auditory-Visual Speech Processing (AVSP 2015): . Paper presented at 13th International Conference on Auditory-Visual Speech Processing (AVSP 2015) (pp. 42). Vienna, Austria.
Open this publication in new window or tab >>Head Movements, Eyebrows, and Phonological Prosodic Prominence Levels in Stockholm
2015 (English)In: 13th International Conference on Auditory-Visual Speech Processing (AVSP 2015), Vienna, Austria, 2015, 42- p.Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Vienna, Austria: , 2015
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180420 (URN)
Conference
13th International Conference on Auditory-Visual Speech Processing (AVSP 2015)
Note

QC 20160216

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
Ambrazaitis, G., Svensson Lundmark, M. & House, D. (2015). Multimodal levels of promincence: a preliminary analysis of head and eyebrow movements in Swedish news broadcasts. In: Lundmark Svensson, M.; Ambrazaitis, G.; van de Weijer, J. (Ed.), Proceedings of Fonetik 2015: . Paper presented at Fonetik 2015, Lund (pp. 11-16). Lund.
Open this publication in new window or tab >>Multimodal levels of promincence: a preliminary analysis of head and eyebrow movements in Swedish news broadcasts
2015 (English)In: Proceedings of Fonetik 2015 / [ed] Lundmark Svensson, M.; Ambrazaitis, G.; van de Weijer, J., Lund, 2015, 11-16 p.Conference paper, Published paper (Other academic)
Place, publisher, year, edition, pages
Lund: , 2015
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180416 (URN)
Conference
Fonetik 2015, Lund
Note

QC 20160216

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
House, D., Alexanderson, S. & Beskow, J. (2015). On the temporal domain of co-speech gestures: syllable, phrase or talk spurt?. In: Lundmark Svensson, M.; Ambrazaitis, G.; van de Weijer, J. (Ed.), Proceedings of Fonetik 2015: . Paper presented at Fonetik 2015, Lund (pp. 63-68). .
Open this publication in new window or tab >>On the temporal domain of co-speech gestures: syllable, phrase or talk spurt?
2015 (English)In: Proceedings of Fonetik 2015 / [ed] Lundmark Svensson, M.; Ambrazaitis, G.; van de Weijer, J., 2015, 63-68 p.Conference paper, Published paper (Other academic)
Abstract [en]

This study explores the use of automatic methods to detect and extract handgesture movement co-occuring with speech. Two spontaneous dyadic dialogueswere analyzed using 3D motion-capture techniques to track hand movement.Automatic speech/non-speech detection was performed on the dialogues resultingin a series of connected talk spurts for each speaker. Temporal synchrony of onsetand offset of gesture and speech was studied between the automatic hand gesturetracking and talk spurts, and compared to an earlier study of head nods andsyllable synchronization. The results indicated onset synchronization between headnods and the syllable in the short temporal domain and between the onset of longergesture units and the talk spurt in a more extended temporal domain.

National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180407 (URN)
Conference
Fonetik 2015, Lund
Note

QC 20160216

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
Zellers, M. & House, D. (2015). Parallels between hand gestures and acoustic prosodic features in turn-taking. In: 14th International Pragmatics Conference: . Paper presented at 14th International Pragmatics Conference (pp. 454-455). Antwerp, Belgium.
Open this publication in new window or tab >>Parallels between hand gestures and acoustic prosodic features in turn-taking
2015 (English)In: 14th International Pragmatics Conference, Antwerp, Belgium, 2015, 454-455 p.Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Antwerp, Belgium: , 2015
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180418 (URN)
Conference
14th International Pragmatics Conference
Note

tmh_import_16_01_13, tmh_id_4024

QC 2016-02-18

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
Karlsson, A., House, D. & Svantesson, J.-O. (2015). Prosodic signaling of information and discourse structure from a typological perspective. In: Proceedings of the 18th International Congress of Phonetic Sciences.: . Paper presented at 18th International Congress of Phonetic Sciences. Glasgow, UK: ICPHS.
Open this publication in new window or tab >>Prosodic signaling of information and discourse structure from a typological perspective
2015 (English)In: Proceedings of the 18th International Congress of Phonetic Sciences., Glasgow, UK: ICPHS , 2015Conference paper, Published paper (Refereed)
Abstract [en]

This study investigates the relationship between prosody and information/discourse structure in spontaneous spoken folk tales in the tonal Mon-Khmer language Northern Kammu, a language that behaves as a typical phrase language where available boundary tones are enhanced to mark information structuring. Topic is always placed before Comment by syntactic movement if necessary. There is a prosodic signaling of the boundary between Topic and Comment. Discourse structure is reflected in prosody, and we find higher boundary tones near the boundaries between Discourse Units. The results are dicussed in terms of a typology of spoken discourse.

Place, publisher, year, edition, pages
Glasgow, UK: ICPHS, 2015
Keyword
prosodic typology, discourse struc ture, information structure, intonation, tone
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180419 (URN)978-0-85261-941-4 (ISBN)
Conference
18th International Congress of Phonetic Sciences
Note

QC 20160216

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4628-3769

Search in DiVA

Show all publications