Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
D64: A corpus of richly recorded conversational interaction
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-8273-0132
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0001-9327-9482
Show others and affiliations
2013 (English)In: Journal on Multimodal User Interfaces, ISSN 1783-7677, E-ISSN 1783-8738, Vol. 7, no 1-2, 19-28 p.Article in journal (Refereed) Published
Abstract [en]

In recent years there has been a substantial debate about the need for increasingly spontaneous, conversational corpora of spoken interaction that are not controlled or task directed. In parallel the need has arisen for the recording of multi-modal corpora which are not restricted to the audio domain alone. With a corpus that would fulfill both needs, it would be possible to investigate the natural coupling, not only in turn-taking and voice, but also in the movement of participants. In the following paper we describe the design and recording of such a corpus and we provide some illustrative examples of how such a corpus might be exploited in the study of dynamic interaction. The D64 corpus is a multimodal corpus recorded over two successive days. Each day resulted in approximately 4 h of recordings. In total five participants took part in the recordings of whom two participants were female and three were male. Seven video cameras were used of which at least one was trained on each participant. The Optitrack motion capture kit was used in order to enrich information. The D64 corpus comprises annotations on conversational involvement, speech activity and pauses as well as information of the average degree of change in the movement of participants.

Place, publisher, year, edition, pages
2013. Vol. 7, no 1-2, 19-28 p.
Keyword [en]
Multimodality corpus, Conversational involvement, Spontaneous speech
National Category
Computer Science Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-109373DOI: 10.1007/s12193-012-0108-6ISI: 000316062300003Scopus ID: 2-s2.0-84874773796OAI: oai:DiVA.org:kth-109373DiVA: diva2:581743
Funder
Swedish Research Council, 2009-1766
Note

QC 20130415

Available from: 2013-01-02 Created: 2013-01-02 Last updated: 2017-12-06Bibliographically approved
In thesis
1. Modelling Engagement in Multi-Party Conversations: Data-Driven Approaches to Understanding Human-Human Communication Patterns for Use in Human-Robot Interactions
Open this publication in new window or tab >>Modelling Engagement in Multi-Party Conversations: Data-Driven Approaches to Understanding Human-Human Communication Patterns for Use in Human-Robot Interactions
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The aim of this thesis is to study human-human interaction in order to provide virtual agents and robots with the capability to engage into multi-party-conversations in a human-like-manner. The focus lies with the modelling of conversational dynamics and the appropriate realization of multi-modal feedback behaviour. For such an undertaking, it is important to understand how human-human communication unfolds in varying contexts and constellations over time. To this end, multi-modal human-human corpora are designed as well as annotation schemes to capture conversational dynamics are developed. Multi-modal analysis is carried out and models are built. Emphasis is put on not modelling speaker behaviour in general and on modelling listener behaviour in particular.

In this thesis, a bridge is built between multi-modal modelling of conversational dynamics on the one hand multi-modal generation of listener behaviour in virtual agents and robots on the other hand. In order to build this bridge, a unit-selection multi-modal synthesis is carried out as well as a statistical speech synthesis of feedback. The effect of a variation in prosody of feedback token on the perception of third-party observers is evaluated. Finally, the effect of a controlled variation of eye-gaze is evaluated, as is the perception of user feedback in human-robot interaction.​

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2016. 87 p.
Series
TRITA-CSC-A, ISSN 1653-5723 ; 2017:05
National Category
Engineering and Technology
Research subject
Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-198175 (URN)978-91-7729-237-1 (ISBN)
Public defence
2017-01-20, F3, Lindstedtsvägen 26, Kungl Tekniska högskolan, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20161214

Available from: 2016-12-14 Created: 2016-12-13 Last updated: 2016-12-14Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Authority records BETA

Edlund, Jens

Search in DiVA

By author/editor
Oertel, CatharineEdlund, Jens
By organisation
Speech Communication and Technology
In the same journal
Journal on Multimodal User Interfaces
Computer ScienceLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 120 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf