Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 82) Show all publications
Ambrazaitis, G. & House, D. (2017). Multimodal prominences: Exploring the patterning and usage of focal pitch accents, head beats and eyebrow beats in Swedish television news readings. Speech Communication, 95, 100-113
Open this publication in new window or tab >>Multimodal prominences: Exploring the patterning and usage of focal pitch accents, head beats and eyebrow beats in Swedish television news readings
2017 (English)In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 95, p. 100-113Article in journal (Refereed) Published
Abstract [en]

Facial beat gestures align with pitch accents in speech, functioning as visual prominence markers. However, it is not yet well understood whether and how gestures and pitch accents might be combined to create different types of multimodal prominence, and how specifically visual prominence cues are used in spoken communication. In this study, we explore the use and possible interaction of eyebrow (EB) and head (HB) beats with so-called focal pitch accents (FA) in a corpus of 31 brief news readings from Swedish television (four news anchors, 986 words in total), focusing on effects of position in text, information structure as well as speaker expressivity. Results reveal an inventory of four primary (combinations of) prominence markers in the corpus: FA+HB+EB, FA+HB, FA only (i.e., no gesture), and HB only, implying that eyebrow beats tend to occur only in combination with the other two markers. In addition, head beats occur significantly more frequently in the second than in the first part of a news reading. A functional analysis of the data suggests that the distribution of head beats might to some degree be governed by information structure, as the text-initial clause often defines a common ground or presents the theme of the news story. In the rheme part of the news story, FA, HB, and FA+HB are all common prominence markers. The choice between them is subject to variation which we suggest might represent a degree of freedom for the speaker to use the markers expressively. A second main observation concerns eyebrow beats, which seem to be used mainly as a kind of intensification marker for highlighting not only contrast, but also value, magnitude, or emotionally loaded words; it is applicable in any position in a text. We thus observe largely different patterns of occurrence and usage of head beats on the one hand and eyebrow beats on the other, suggesting that the two represent two separate modalities of visual prominence cuing.

Place, publisher, year, edition, pages
Elsevier B.V., 2017
Keywords
Degrees of freedom (mechanics), Common ground, Degree of freedom, Information structures, Multi-modal, Pitch accents, Swedishs, Continuous speech recognition
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-227120 (URN)10.1016/j.specom.2017.08.008 (DOI)2-s2.0-85034707910 (Scopus ID)
Note

QC 20180508

Available from: 2018-05-08 Created: 2018-05-08 Last updated: 2018-05-08Bibliographically approved
Alexanderson, S., House, D. & Beskow, J. (2016). Automatic annotation of gestural units in spontaneous face-to-face interaction. In: MA3HMI 2016 - Proceedings of the Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction: . Paper presented at 2016 Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, MA3HMI 2016, 12 November 2016 through 16 November 2016 (pp. 15-19).
Open this publication in new window or tab >>Automatic annotation of gestural units in spontaneous face-to-face interaction
2016 (English)In: MA3HMI 2016 - Proceedings of the Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, 2016, p. 15-19Conference paper, Published paper (Refereed)
Abstract [en]

Speech and gesture co-occur in spontaneous dialogue in a highly complex fashion. There is a large variability in the motion that people exhibit during a dialogue, and different kinds of motion occur during different states of the interaction. A wide range of multimodal interface applications, for example in the fields of virtual agents or social robots, can be envisioned where it is important to be able to automatically identify gestures that carry information and discriminate them from other types of motion. While it is easy for a human to distinguish and segment manual gestures from a flow of multimodal information, the same task is not trivial to perform for a machine. In this paper we present a method to automatically segment and label gestural units from a stream of 3D motion capture data. The gestural flow is modeled with a 2-level Hierarchical Hidden Markov Model (HHMM) where the sub-states correspond to gesture phases. The model is trained based on labels of complete gesture units and self-adaptive manipulators. The model is tested and validated on two datasets differing in genre and in method of capturing motion, and outperforms a state-of-the-art SVM classifier on a publicly available dataset.

Keywords
Gesture recognition, Motion capture, Spontaneous dialogue, Hidden Markov models, Man machine systems, Markov processes, Online systems, 3D motion capture, Automatic annotation, Face-to-face interaction, Hierarchical hidden markov models, Multi-modal information, Multi-modal interfaces, Classification (of information)
National Category
Robotics
Identifiers
urn:nbn:se:kth:diva-202135 (URN)10.1145/3011263.3011268 (DOI)2-s2.0-85003571594 (Scopus ID)9781450345620 (ISBN)
Conference
2016 Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, MA3HMI 2016, 12 November 2016 through 16 November 2016
Funder
Swedish Research Council, 2010-4646
Note

Funding text: The work reported here is carried out within the projects: "Timing of intonation and gestures in spoken communication," (P12-0634:1) funded by the Bank of Sweden Tercentenary Foundation, and "Large-scale massively multimodal modelling of non-verbal behaviour in spontaneous dialogue," (VR 2010-4646) funded by Swedish Research Council.

Available from: 2017-03-13 Created: 2017-03-13 Last updated: 2017-11-24Bibliographically approved
Zellers, M., House, D. & Alexanderson, S. (2016). Prosody and hand gesture at turn boundaries in Swedish. In: Proceedings of the International Conference on Speech Prosody: . Paper presented at 8th Speech Prosody 2016, 31 May 2016 through 3 June 2016 (pp. 831-835). International Speech Communications Association
Open this publication in new window or tab >>Prosody and hand gesture at turn boundaries in Swedish
2016 (English)In: Proceedings of the International Conference on Speech Prosody, International Speech Communications Association , 2016, p. 831-835Conference paper, Published paper (Refereed)
Abstract [en]

In order to ensure smooth turn-taking between conversational participants, interlocutors must have ways of providing information to one another about whether they have finished speaking or intend to continue. The current work investigates Swedish speakers’ use of hand gestures in conjunction with turn change or turn hold in unrestricted, spontaneous speech. As has been reported by other researchers, we find that speakers’ gestures end before the end of speech in cases of turn change, while they may extend well beyond the end of a given speech chunk in the case of turn hold. We investigate the degree to which prosodic cues and gesture cues to turn transition in Swedish face-to-face conversation are complementary versus functioning additively. The co-occurrence of acoustic prosodic features and gesture at potential turn boundaries gives strong support for considering hand gestures as part of the prosodic system, particularly in the context of discourse-level information such as maintaining smooth turn transition.

Place, publisher, year, edition, pages
International Speech Communications Association, 2016
Keywords
Gesture, Multimodal communication, Swedish, Turn transition, Co-occurrence, Face-to-face conversation, Multimodal communications, Prosodic features, Smooth turn-taking, Spontaneous speech, Swedishs, Speech
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-195492 (URN)2-s2.0-84982980451 (Scopus ID)
Conference
8th Speech Prosody 2016, 31 May 2016 through 3 June 2016
Note

QC 20161125

Available from: 2016-11-25 Created: 2016-11-03 Last updated: 2018-01-13Bibliographically approved
Strömbergsson, S., Salvi, G. & House, D. (2015). Acoustic and perceptual evaluation of category goodness of /t/ and /k/ in typical and misarticulated children's speech. Journal of the Acoustical Society of America, 137(6), 3422-3435
Open this publication in new window or tab >>Acoustic and perceptual evaluation of category goodness of /t/ and /k/ in typical and misarticulated children's speech
2015 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 137, no 6, p. 3422-3435Article in journal (Refereed) Published
Abstract [en]

This investigation explores perceptual and acoustic characteristics of children's successful and unsuccessful productions of /t/ and /k/, with a specific aim of exploring perceptual sensitivity to phonetic detail, and the extent to which this sensitivity is reflected in the acoustic domain. Recordings were collected from 4- to 8-year-old children with a speech sound disorder (SSD) who misarticulated one of the target plosives, and compared to productions recorded from peers with typical speech development (TD). Perceptual responses were registered with regards to a visual-analog scale, ranging from "clear [t]" to "clear [k]." Statistical models of prototypical productions were built, based on spectral moments and discrete cosine transform features, and used in the scoring of SSD productions. In the perceptual evaluation, " clear substitutions" were rated as less prototypical than correct productions. Moreover, target-appropriate productions of /t/ and /k/ produced by children with SSD were rated as less prototypical than those produced by TD peers. The acoustical modeling could to a large extent discriminate between the gross categories /t/ and /k/, and scored the SSD utterances on a continuous scale that was largely consistent with the category of production. However, none of the methods exhibited the same sensitivity to phonetic detail as the human listeners.

National Category
Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-171155 (URN)10.1121/1.4921033 (DOI)000356622400057 ()26093431 (PubMedID)2-s2.0-84935019965 (Scopus ID)
Note

QC 20150720

Available from: 2015-07-20 Created: 2015-07-20 Last updated: 2017-12-04Bibliographically approved
Artman, H., House, D. & Hulten, M. (2015). Designed by Engineers: An analysis of interactionaries with engineering students. Designs for Learning, 7(2), 28-56, Article ID 10.2478/dfl-2014-0062.
Open this publication in new window or tab >>Designed by Engineers: An analysis of interactionaries with engineering students
2015 (English)In: Designs for Learning, ISSN 1654-7608, Vol. 7, no 2, p. 28-56, article id 10.2478/dfl-2014-0062Article in journal (Refereed) Published
Abstract [en]

The aim of this study is to describe and analyze learning taking place in a collaborative design exercise involving engineering students. The students perform a time-constrained, open-ended, complex interaction design task, an “interactionary”. A multimodal learning perspective is used. We have performed detailed analyses of video recordings of the engineering students, including classifying aspects of interaction. Our results show that the engineering students carry out and articulate their design work using a technology-centred approach and focus more on the function of their designs than on aspects of interaction. The engineering students mainly make use of ephemeral communication strategies (gestures and speech) rather than sketching in physical materials. We conclude that the interactionary may be an educational format that can help engineering students learn the messiness of design work. We further identify several constraints to the engineering students’ design learning and propose useful interventions that a teacher could make during an interactionary. We especially emphasize interventions that help engineering students retain aspects of human-centered design throughout the design process. This study partially replicates a previous study which involved interaction design students.

Place, publisher, year, edition, pages
De Gruyter Open, 2015
Keywords
design, engineering education, interactionary, interaction design, learning design sequence, multimodal learning
National Category
Learning Human Aspects of ICT Communication Studies
Research subject
Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-164951 (URN)10.2478/dfl-2014-0062 (DOI)
Note

QC 20150424

Available from: 2015-04-21 Created: 2015-04-21 Last updated: 2017-12-04Bibliographically approved
Ambrazaitis, G., Svensson Lundmark, M. & House, D. (2015). Head beats and eyebrow movements as a function of phonological prominence levels and word accents in Stockholm Swedish news broadcasts. In: The 3rd European Symposium on Multimodal Communication: . Paper presented at The 3rd European Symposium on Multimodal Communication, Dublin on 17, 18 September 2015. Dublin, Ireland
Open this publication in new window or tab >>Head beats and eyebrow movements as a function of phonological prominence levels and word accents in Stockholm Swedish news broadcasts
2015 (English)In: The 3rd European Symposium on Multimodal Communication, Dublin, Ireland, 2015Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Dublin, Ireland: , 2015
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180421 (URN)
Conference
The 3rd European Symposium on Multimodal Communication, Dublin on 17, 18 September 2015
Note

QC 20160216

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
Ambrazaitis, G., Svensson Lundmark, M. & House, D. (2015). Head Movements, Eyebrows, and Phonological Prosodic Prominence Levels in Stockholm. In: 13th International Conference on Auditory-Visual Speech Processing (AVSP 2015): . Paper presented at 13th International Conference on Auditory-Visual Speech Processing (AVSP 2015) (pp. 42). Vienna, Austria
Open this publication in new window or tab >>Head Movements, Eyebrows, and Phonological Prosodic Prominence Levels in Stockholm
2015 (English)In: 13th International Conference on Auditory-Visual Speech Processing (AVSP 2015), Vienna, Austria, 2015, p. 42-Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Vienna, Austria: , 2015
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180420 (URN)
Conference
13th International Conference on Auditory-Visual Speech Processing (AVSP 2015)
Note

QC 20160216

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
Ambrazaitis, G., Svensson Lundmark, M. & House, D. (2015). Multimodal levels of promincence: a preliminary analysis of head and eyebrow movements in Swedish news broadcasts. In: Lundmark Svensson, M.; Ambrazaitis, G.; van de Weijer, J. (Ed.), Proceedings of Fonetik 2015: . Paper presented at Fonetik 2015, Lund (pp. 11-16). Lund
Open this publication in new window or tab >>Multimodal levels of promincence: a preliminary analysis of head and eyebrow movements in Swedish news broadcasts
2015 (English)In: Proceedings of Fonetik 2015 / [ed] Lundmark Svensson, M.; Ambrazaitis, G.; van de Weijer, J., Lund, 2015, p. 11-16Conference paper, Published paper (Other academic)
Place, publisher, year, edition, pages
Lund: , 2015
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180416 (URN)
Conference
Fonetik 2015, Lund
Note

QC 20160216

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
House, D., Alexanderson, S. & Beskow, J. (2015). On the temporal domain of co-speech gestures: syllable, phrase or talk spurt?. In: Lundmark Svensson, M.; Ambrazaitis, G.; van de Weijer, J. (Ed.), Proceedings of Fonetik 2015: . Paper presented at Fonetik 2015, Lund (pp. 63-68).
Open this publication in new window or tab >>On the temporal domain of co-speech gestures: syllable, phrase or talk spurt?
2015 (English)In: Proceedings of Fonetik 2015 / [ed] Lundmark Svensson, M.; Ambrazaitis, G.; van de Weijer, J., 2015, p. 63-68Conference paper, Published paper (Other academic)
Abstract [en]

This study explores the use of automatic methods to detect and extract handgesture movement co-occuring with speech. Two spontaneous dyadic dialogueswere analyzed using 3D motion-capture techniques to track hand movement.Automatic speech/non-speech detection was performed on the dialogues resultingin a series of connected talk spurts for each speaker. Temporal synchrony of onsetand offset of gesture and speech was studied between the automatic hand gesturetracking and talk spurts, and compared to an earlier study of head nods andsyllable synchronization. The results indicated onset synchronization between headnods and the syllable in the short temporal domain and between the onset of longergesture units and the talk spurt in a more extended temporal domain.

National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180407 (URN)
Conference
Fonetik 2015, Lund
Note

QC 20160216

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
Zellers, M. & House, D. (2015). Parallels between hand gestures and acoustic prosodic features in turn-taking. In: 14th International Pragmatics Conference: . Paper presented at 14th International Pragmatics Conference (pp. 454-455). Antwerp, Belgium
Open this publication in new window or tab >>Parallels between hand gestures and acoustic prosodic features in turn-taking
2015 (English)In: 14th International Pragmatics Conference, Antwerp, Belgium, 2015, p. 454-455Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Antwerp, Belgium: , 2015
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180418 (URN)
Conference
14th International Pragmatics Conference
Note

tmh_import_16_01_13, tmh_id_4024

QC 2016-02-18

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4628-3769

Search in DiVA

Show all publications