Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 103) Show all publications
Edlund, J., Tånnander, C. & Gustafson, J. (2015). Audience response system-based assessment for analysis-by-synthesis. In: Proc. of ICPhS 2015: . Paper presented at ICPhS 2015. ICPhS.
Open this publication in new window or tab >>Audience response system-based assessment for analysis-by-synthesis
2015 (English)In: Proc. of ICPhS 2015, ICPhS , 2015Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
ICPhS, 2015
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180399 (URN)
Conference
ICPhS 2015
Note

QC 20160317

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
Edlund, J., Heldner, M. & Wlodarczak, M. (2014). Catching wind of multiparty conversation. In: LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION: . Paper presented at 9th International Conference on Language Resources and Evaluation (LREC), MAY 26-31, 2014, Reykjavik, ICELAND. .
Open this publication in new window or tab >>Catching wind of multiparty conversation
2014 (English)In: LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014Conference paper, Published paper (Refereed)
Abstract [en]

The paper describes the design of a novel corpus of respiratory activity in spontaneous multiparty face-to-face conversations in Swedish. The corpus is collected with the primary goal of investigating the role of breathing for interactive control of interaction. Physiological correlates of breathing are captured by means of respiratory belts, which measure changes in cross sectional area of the rib cage and the abdomen. Additionally, auditory and visual cues of breathing are recorded in parallel to the actual conversations. The corpus allows studying respiratory mechanisms underlying organisation of spontaneous communication, especially in connection with turn management. As such, it is a valuable resource both for fundamental research and speech techonology applications.

Keyword
breathing, multiparty conversation, turn-taking, respiratory inductance plethysmography, physiological measurements
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-173466 (URN)000355611001010 ()978-2-9517408-8-4 (ISBN)
Conference
9th International Conference on Language Resources and Evaluation (LREC), MAY 26-31, 2014, Reykjavik, ICELAND
Note

QC 20150921

Available from: 2015-09-21 Created: 2015-09-11 Last updated: 2018-01-11Bibliographically approved
Edlund, J., Edelstam, F. & Gustafson, J. (2014). Human pause and resume behaviours for unobtrusive humanlike in-car spoken dialogue systems. In: Proceedings of the of the EACL 2014 Workshop on Dialogue in Motion (DM): . Paper presented at the EACL Satellite Workshop Dialogue In Motion (DIM-2014), Gothenburg, Sweden, April 26-30 2014 (pp. 73-77). Gothenburg, Sweden.
Open this publication in new window or tab >>Human pause and resume behaviours for unobtrusive humanlike in-car spoken dialogue systems
2014 (English)In: Proceedings of the of the EACL 2014 Workshop on Dialogue in Motion (DM), Gothenburg, Sweden, 2014, 73-77 p.Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a first, largely qualitative analysis of a set of human-human dialogues recorded specifically to provide insights in how humans handle pauses and resumptions in situations where the speakers cannot see each other, but have to rely on the acoustic signal alone. The work presented is part of a larger effort to find unobtrusive human dialogue behaviours that can be mimicked and implemented in-car spoken dialogue systems within in the EU project Get Home Safe, a collaboration between KTH, DFKI, Nuance, IBM and Daimler aiming to find ways of driver interaction that minimizes safety issues,. The analysis reveals several human temporal, semantic/pragmatic, and structural behaviours that are good candidates for inclusion in spoken dialogue systems.

Place, publisher, year, edition, pages
Gothenburg, Sweden: , 2014
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-158156 (URN)
Conference
the EACL Satellite Workshop Dialogue In Motion (DIM-2014), Gothenburg, Sweden, April 26-30 2014
Note

tmh_import_14_12_30, tmh_id_3926. QC 20150218

Available from: 2014-12-30 Created: 2014-12-30 Last updated: 2018-01-11Bibliographically approved
Strömbergsson, S., Tånnander, C. & Edlund, J. (2014). Ranking severity of speech errors by their phonological impact in context. Paper presented at Conference of 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages. Interspeech, 1568-1572.
Open this publication in new window or tab >>Ranking severity of speech errors by their phonological impact in context
2014 (English)In: Interspeech, ISSN 2308-457X, 1568-1572 p.Article in journal (Refereed) Published
Abstract [en]

Children with speech disorders often present with systematic speech error patterns. In clinical assessments of speech disorders, evaluating the severity of the disorder is central. Current measures of severity have limited sensitivity to factors like the frequency of the target sounds in the child’s language and the degree of phonological diversity, which are factors that can be assumed to affect intelligibility. By constructing phonological filters to simulate eight speech error patterns often observed in children, and applying these filters to a phonologically transcribed corpus of 350K words, this study explores three quantitative measures of phonological impact: Percentage of Consonants Correct (PCC), edit distance, and degree of homonymy. These metrics were related to estimated ratings of severity collected from 34 practicing clinicians. The results show an expected high correlation between the PCC and edit distance metrics, but that none of the three metrics align with clinicians’ ratings. Although these results do not generate definite answers to what phonological factors contribute the most to (un)intelligibility, this study demonstrates a methodology that allows for large-scale investigations of the interplay between phonological errors and their impact on speech in context, within and across languages.

Keyword
speech disorders, intelligibility, child speech
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-168519 (URN)2-s2.0-84910090287 (Scopus ID)
Conference
Conference of 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages
Note

QC 20150609

Available from: 2015-06-09 Created: 2015-06-04 Last updated: 2018-01-11Bibliographically approved
Al Moubayed, S., Edlund, J. & Gustafson, J. (2013). Analysis of gaze and speech patterns in three-party quiz game interaction. In: Interspeech 2013: . Paper presented at 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013. ISCA 2013 (pp. 1126-1130). .
Open this publication in new window or tab >>Analysis of gaze and speech patterns in three-party quiz game interaction
2013 (English)In: Interspeech 2013, 2013, 1126-1130 p.Conference paper, Published paper (Refereed)
Abstract [en]

In order to understand and model the dynamics between interaction phenomena such as gaze and speech in face-to-face multiparty interaction between humans, we need large quantities of reliable, objective data of such interactions. To date, this type of data is in short supply. We present a data collection setup using automated, objective techniques in which we capture the gaze and speech patterns of triads deeply engaged in a high-stakes quiz game. The resulting corpus consists of five one-hour recordings, and is unique in that it makes use of three state-of-the-art gaze trackers (one per subject) in combination with a state-of-theart conical microphone array designed to capture roundtable meetings. Several video channels are also included. In this paper we present the obstacles we encountered and the possibilities afforded by a synchronised, reliable combination of large-scale multi-party speech and gaze data, and an overview of the first analyses of the data. Index Terms: multimodal corpus, multiparty dialogue, gaze patterns, multiparty gaze.

National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-137388 (URN)2-s2.0-84906231582 (Scopus ID)
Conference
14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013. ISCA 2013
Note

QC 20140603

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2018-01-11Bibliographically approved
Edlund, J., Al Moubayed, S., Tånnander, C. & Gustafson, J. (2013). Audience response system based annotation of speech. In: Proceedings of Fonetik 2013: . Paper presented at XXVIth Annual Phonetics Meeting Fonetik 2013; Linköping, Sweden, 12–13 June, 2013 (pp. 13-16). Linköping: Linköping University.
Open this publication in new window or tab >>Audience response system based annotation of speech
2013 (English)In: Proceedings of Fonetik 2013, Linköping: Linköping University , 2013, 13-16 p.Conference paper, Published paper (Other academic)
Abstract [en]

Manual annotators are often used to label speech. The task is associated with high costs and with great time consumption. We suggest to reach an increased throughput while maintaining a high measure of experimental control by borrowing from the Audience Response Systems used in the film and television industries, and demonstrate a cost-efficient setup for rapid, plenary annotation of phenomena occurring in recorded speech together with some results from studies we have undertaken to quantify the temporal precision and reliability of such annotations.

Place, publisher, year, edition, pages
Linköping: Linköping University, 2013
Series
Studies in Language and Culture, ISSN 1403-2570 ; 21
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-137389 (URN)978-91-7519-582-7 (ISBN)978-91-7519-579-7 (ISBN)
Conference
XXVIth Annual Phonetics Meeting Fonetik 2013; Linköping, Sweden, 12–13 June, 2013
Note

tmh_import_13_12_13, tmh_id_3856

QC 20140219

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2018-01-11Bibliographically approved
Heldner, M., Hjalmarsson, A. & Edlund, J. (2013). Backchannel relevance spaces. In: Eva Liina / Lippus, Pärtel (Ed.), Prosody: Proceedings of the XIth Conference, Tartu 2012. Paper presented at Prosody: Proceedings of XIth Conference (pp. 137-146). Peter Lang Publishing Group.
Open this publication in new window or tab >>Backchannel relevance spaces
2013 (English)In: Prosody: Proceedings of the XIth Conference, Tartu 2012 / [ed] Eva Liina / Lippus, Pärtel, Peter Lang Publishing Group, 2013, 137-146 p.Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Peter Lang Publishing Group, 2013
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-137398 (URN)978-3631644270 (ISBN)
Conference
Prosody: Proceedings of XIth Conference
Note

tmh_import_13_12_13, tmh_id_3870. QC 20140129

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2018-01-11Bibliographically approved
Edlund, J., Al Moubayed, S. & Beskow, J. (2013). Co-present or Not?: Embodiment, Situatedness and the Mona Lisa Gaze Effect. In: Nakano, Yukiko; Conati, Cristina; Bader, Thomas (Ed.), Eye gaze in intelligent user interfaces: gaze-based analyses, models and applications (pp. 185-203). London: Springer London.
Open this publication in new window or tab >>Co-present or Not?: Embodiment, Situatedness and the Mona Lisa Gaze Effect
2013 (English)In: Eye gaze in intelligent user interfaces: gaze-based analyses, models and applications / [ed] Nakano, Yukiko; Conati, Cristina; Bader, Thomas, London: Springer London, 2013, 185-203 p.Chapter in book (Refereed)
Abstract [en]

The interest in embodying and situating computer programmes took off in the autonomous agents community in the 90s. Today, researchers and designers of programmes that interact with people on human terms endow their systems with humanoid physiognomies for a variety of reasons. In most cases, attempts at achieving this embodiment and situatedness has taken one of two directions: virtual characters and actual physical robots. In addition, a technique that is far from new is gaining ground rapidly: projection of animated faces on head-shaped 3D surfaces. In this chapter, we provide a history of this technique; an overview of its pros and cons; and an in-depth description of the cause and mechanics of the main drawback of 2D displays of 3D faces (and objects): the Mona Liza gaze effect. We conclude with a description of an experimental paradigm that measures perceived directionality in general and the Mona Lisa gaze effect in particular.

Place, publisher, year, edition, pages
London: Springer London, 2013
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-137382 (URN)10.1007/978-1-4471-4784-8_10 (DOI)978-1-4471-4783-1 (ISBN)978-1-4471-4784-8 (ISBN)
Note

tmh_import_13_12_13, tmh_id_3782

QC 20140219

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2018-01-11Bibliographically approved
Oertel, C., Cummins, F., Edlund, J., Wagner, P. & Campbell, N. (2013). D64: A corpus of richly recorded conversational interaction. Journal on Multimodal User Interfaces, 7(1-2), 19-28.
Open this publication in new window or tab >>D64: A corpus of richly recorded conversational interaction
Show others...
2013 (English)In: Journal on Multimodal User Interfaces, ISSN 1783-7677, E-ISSN 1783-8738, Vol. 7, no 1-2, 19-28 p.Article in journal (Refereed) Published
Abstract [en]

In recent years there has been a substantial debate about the need for increasingly spontaneous, conversational corpora of spoken interaction that are not controlled or task directed. In parallel the need has arisen for the recording of multi-modal corpora which are not restricted to the audio domain alone. With a corpus that would fulfill both needs, it would be possible to investigate the natural coupling, not only in turn-taking and voice, but also in the movement of participants. In the following paper we describe the design and recording of such a corpus and we provide some illustrative examples of how such a corpus might be exploited in the study of dynamic interaction. The D64 corpus is a multimodal corpus recorded over two successive days. Each day resulted in approximately 4 h of recordings. In total five participants took part in the recordings of whom two participants were female and three were male. Seven video cameras were used of which at least one was trained on each participant. The Optitrack motion capture kit was used in order to enrich information. The D64 corpus comprises annotations on conversational involvement, speech activity and pauses as well as information of the average degree of change in the movement of participants.

Keyword
Multimodality corpus, Conversational involvement, Spontaneous speech
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-109373 (URN)10.1007/s12193-012-0108-6 (DOI)000316062300003 ()2-s2.0-84874773796 (Scopus ID)
Funder
Swedish Research Council, 2009-1766
Note

QC 20130415

Available from: 2013-01-02 Created: 2013-01-02 Last updated: 2018-01-11Bibliographically approved
Edlund, J., Al Moubayed, S., Tånnander, C. & Gustafson, J. (2013). Temporal precision and reliability of audience response system based annotation. In: Proc. of Multimodal Corpora 2013: . Paper presented at Multimodal Corpora (MMC2013), Beyond Audio and Video, IVA Workshop III; Edinburgh, U.K., 29-31 August, 2013. .
Open this publication in new window or tab >>Temporal precision and reliability of audience response system based annotation
2013 (English)In: Proc. of Multimodal Corpora 2013, 2013Conference paper, Published paper (Refereed)
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-137385 (URN)
Conference
Multimodal Corpora (MMC2013), Beyond Audio and Video, IVA Workshop III; Edinburgh, U.K., 29-31 August, 2013
Note

tmh_import_13_12_13, tmh_id_3847

NQC 2014

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2018-01-11Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-9327-9482

Search in DiVA

Show all publications