Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 107) Show all publications
Fallgren, P., Malisz, Z. & Edlund, J. (2018). A tool for exploring large amounts of found audio data. In: CEUR Workshop Proceedings: . Paper presented at 3rd Conference on Digital Humanities in the Nordic Countries, DHN 2018, 7 March 2018 through 9 March 2018 (pp. 499-503). CEUR-WS
Open this publication in new window or tab >>A tool for exploring large amounts of found audio data
2018 (English)In: CEUR Workshop Proceedings, CEUR-WS , 2018, p. 499-503Conference paper, Published paper (Refereed)
Abstract [en]

We demonstrate a method and a set of open source tools (beta) for nonsequential browsing of large amounts of audio data. The demonstration will contain versions of a set of functionalities in their first stages, and will provide a good insight in how the method can be used to browse through large quantities of audio data efficiently.

Place, publisher, year, edition, pages
CEUR-WS, 2018
Keywords
Found data, Machine learning, Speech processing, Visualization, Flow visualization, Learning systems, Audio data, Large amounts, Nonsequential, Open source tools, Data visualization
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-227479 (URN)2-s2.0-85045345183 (Scopus ID)
Conference
3rd Conference on Digital Humanities in the Nordic Countries, DHN 2018, 7 March 2018 through 9 March 2018
Note

Conference code: 135422; Export Date: 9 May 2018; Conference Paper; Funding details: 2013-02003, TRC, The Research Council; Funding text: The project described here is funded in full by Riksbankens Jubileumsfond (SAF16-0917: 1). Its results will be made more widely accessible through the infrastructure supported by SWE-CLARIN (Swedish research Council 2013-02003). QC 20180516

Available from: 2018-05-16 Created: 2018-05-16 Last updated: 2018-05-16Bibliographically approved
Borin, L., Forsberg, M., Edlund, J. & Domeij, R. (2018). Språkbanken 2018: Research resources for text, speech, & society. In: CEUR Workshop Proceedings: . Paper presented at 3rd Conference on Digital Humanities in the Nordic Countries, DHN 2018, 7 March 2018 through 9 March 2018 (pp. 504-506). CEUR-WS
Open this publication in new window or tab >>Språkbanken 2018: Research resources for text, speech, & society
2018 (English)In: CEUR Workshop Proceedings, CEUR-WS , 2018, p. 504-506Conference paper, Published paper (Refereed)
Abstract [en]

We introduce an expanded version of the Swedish research resource Språkbanken (the Swedish Language Bank). In 2018, Språkbanken, which has supported national and international research for over four decades, adds two branches, one focusing on speech and one on societal aspects of language, to its existing organization, which targets text. 

Place, publisher, year, edition, pages
CEUR-WS, 2018
Keywords
Infrastructure, Society, Speech, Text, International researches, Swedishs
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-227481 (URN)2-s2.0-85045307620 (Scopus ID)
Conference
3rd Conference on Digital Humanities in the Nordic Countries, DHN 2018, 7 March 2018 through 9 March 2018
Note

Conference code: 135422; Export Date: 9 May 2018; Conference Paper. QC 20180516

Available from: 2018-05-16 Created: 2018-05-16 Last updated: 2018-05-16Bibliographically approved
Strömbergsson, S., Edlund, J., Götze, J. & Björkenstam, K. N. (2017). Approximating phonotactic input in children's linguistic environments from orthographic transcripts. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017: . Paper presented at 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 20 August 2017 through 24 August 2017 (pp. 2213-2217). International Speech Communication Association, 2017
Open this publication in new window or tab >>Approximating phonotactic input in children's linguistic environments from orthographic transcripts
2017 (English)In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, International Speech Communication Association , 2017, Vol. 2017, p. 2213-2217Conference paper, Published paper (Refereed)
Abstract [en]

Child-directed spoken data is the ideal source of support for claims about children's linguistic environments. However, phonological transcriptions of child-directed speech are scarce, compared to sources like adult-directed speech or text data. Acquiring reliable descriptions of children's phonological environments from more readily accessible sources would mean considerable savings of time and money. The first step towards this goal is to quantify the reliability of descriptions derived from such secondary sources. We investigate how phonological distributions vary across different modalities (spoken vs. written), and across the age of the intended audience (children vs. adults). Using a previously unseen collection of Swedish adult-and child-directed spoken and written data, we combine lexicon look-up and graphemeto-phoneme conversion to approximate phonological characteristics. The analysis shows distributional differences across datasets both for single phonemes and for longer phoneme sequences. Some of these are predictably attributed to lexical and contextual characteristics of text vs. speech. The generated phonological transcriptions are remarkably reliable. The differences in phonological distributions between child-directed speech and secondary sources highlight a need for compensatory measures when relying on written data or on adult-directed spoken data, and/or for continued collection of actual child-directed speech in research on children's language environments.

Place, publisher, year, edition, pages
International Speech Communication Association, 2017
Keywords
Grapheme-To-Phoneme Conversion, language Acquisition, Phonology
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-222093 (URN)10.21437/Interspeech.2017-1634 (DOI)2-s2.0-85039166025 (Scopus ID)
Conference
18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 20 August 2017 through 24 August 2017
Note

QC 20180131

Available from: 2018-01-31 Created: 2018-01-31 Last updated: 2018-01-31Bibliographically approved
Edlund, J. & Gustafson, J. (2016). Hidden resources - Strategies to acquire and exploit potential spoken language resources in national archives. In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016: . Paper presented at 10th International Conference on Language Resources and Evaluation, LREC 2016, 23 May 2016 through 28 May 2016 (pp. 4531-4534). European Language Resources Association (ELRA)
Open this publication in new window or tab >>Hidden resources - Strategies to acquire and exploit potential spoken language resources in national archives
2016 (English)In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, European Language Resources Association (ELRA) , 2016, p. 4531-4534Conference paper, Published paper (Refereed)
Abstract [en]

In 2014, the Swedish government tasked a Swedish agency, The Swedish Post and Telecom Authority (PTS), with investigating how to best create and populate an infrastructure for spoken language resources (Ref N2014/2840/ITP). As a part of this work, the department of Speech, Music and Hearing at KTH Royal Institute of Technology have taken inventory of existing potential spoken language resources, mainly in Swedish national archives and other governmental or public institutions. In this position paper, key priorities, perspectives, and strategies that may be of general, rather than Swedish, interest are presented. We discuss broad types of potential spoken language resources available; to what extent these resources are free to use; and thirdly the main contribution: strategies to ensure the continuous acquisition of spoken language resources in a manner that facilitates speech and speech technology research.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA), 2016
Keywords
National archives, Oral history, Speech, Hidden resources, Public institution, Royal Institute of Technology, Speech technology, Spoken languages, Swedish government, Audition
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-222949 (URN)2-s2.0-85037133240 (Scopus ID)9782951740891 (ISBN)
Conference
10th International Conference on Language Resources and Evaluation, LREC 2016, 23 May 2016 through 28 May 2016
Note

QC 20180327

Available from: 2018-03-27 Created: 2018-03-27 Last updated: 2018-05-24Bibliographically approved
Edlund, J., Tånnander, C. & Gustafson, J. (2015). Audience response system-based assessment for analysis-by-synthesis. In: Proc. of ICPhS 2015: . Paper presented at ICPhS 2015. ICPhS
Open this publication in new window or tab >>Audience response system-based assessment for analysis-by-synthesis
2015 (English)In: Proc. of ICPhS 2015, ICPhS , 2015Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
ICPhS, 2015
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180399 (URN)
Conference
ICPhS 2015
Note

QC 20160317

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
Edlund, J., Heldner, M. & Wlodarczak, M. (2014). Catching wind of multiparty conversation. In: LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION: . Paper presented at 9th International Conference on Language Resources and Evaluation (LREC), MAY 26-31, 2014, Reykjavik, ICELAND.
Open this publication in new window or tab >>Catching wind of multiparty conversation
2014 (English)In: LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014Conference paper, Published paper (Refereed)
Abstract [en]

The paper describes the design of a novel corpus of respiratory activity in spontaneous multiparty face-to-face conversations in Swedish. The corpus is collected with the primary goal of investigating the role of breathing for interactive control of interaction. Physiological correlates of breathing are captured by means of respiratory belts, which measure changes in cross sectional area of the rib cage and the abdomen. Additionally, auditory and visual cues of breathing are recorded in parallel to the actual conversations. The corpus allows studying respiratory mechanisms underlying organisation of spontaneous communication, especially in connection with turn management. As such, it is a valuable resource both for fundamental research and speech techonology applications.

Keywords
breathing, multiparty conversation, turn-taking, respiratory inductance plethysmography, physiological measurements
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-173466 (URN)000355611001010 ()978-2-9517408-8-4 (ISBN)
Conference
9th International Conference on Language Resources and Evaluation (LREC), MAY 26-31, 2014, Reykjavik, ICELAND
Note

QC 20150921

Available from: 2015-09-21 Created: 2015-09-11 Last updated: 2018-01-11Bibliographically approved
Edlund, J., Edelstam, F. & Gustafson, J. (2014). Human pause and resume behaviours for unobtrusive humanlike in-car spoken dialogue systems. In: Proceedings of the of the EACL 2014 Workshop on Dialogue in Motion (DM): . Paper presented at the EACL Satellite Workshop Dialogue In Motion (DIM-2014), Gothenburg, Sweden, April 26-30 2014 (pp. 73-77). Gothenburg, Sweden
Open this publication in new window or tab >>Human pause and resume behaviours for unobtrusive humanlike in-car spoken dialogue systems
2014 (English)In: Proceedings of the of the EACL 2014 Workshop on Dialogue in Motion (DM), Gothenburg, Sweden, 2014, p. 73-77Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a first, largely qualitative analysis of a set of human-human dialogues recorded specifically to provide insights in how humans handle pauses and resumptions in situations where the speakers cannot see each other, but have to rely on the acoustic signal alone. The work presented is part of a larger effort to find unobtrusive human dialogue behaviours that can be mimicked and implemented in-car spoken dialogue systems within in the EU project Get Home Safe, a collaboration between KTH, DFKI, Nuance, IBM and Daimler aiming to find ways of driver interaction that minimizes safety issues,. The analysis reveals several human temporal, semantic/pragmatic, and structural behaviours that are good candidates for inclusion in spoken dialogue systems.

Place, publisher, year, edition, pages
Gothenburg, Sweden: , 2014
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-158156 (URN)
Conference
the EACL Satellite Workshop Dialogue In Motion (DIM-2014), Gothenburg, Sweden, April 26-30 2014
Note

tmh_import_14_12_30, tmh_id_3926. QC 20150218

Available from: 2014-12-30 Created: 2014-12-30 Last updated: 2018-01-11Bibliographically approved
Strömbergsson, S., Tånnander, C. & Edlund, J. (2014). Ranking severity of speech errors by their phonological impact in context. Paper presented at Conference of 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages. Interspeech, 1568-1572
Open this publication in new window or tab >>Ranking severity of speech errors by their phonological impact in context
2014 (English)In: Interspeech, ISSN 2308-457X, p. 1568-1572Article in journal (Refereed) Published
Abstract [en]

Children with speech disorders often present with systematic speech error patterns. In clinical assessments of speech disorders, evaluating the severity of the disorder is central. Current measures of severity have limited sensitivity to factors like the frequency of the target sounds in the child’s language and the degree of phonological diversity, which are factors that can be assumed to affect intelligibility. By constructing phonological filters to simulate eight speech error patterns often observed in children, and applying these filters to a phonologically transcribed corpus of 350K words, this study explores three quantitative measures of phonological impact: Percentage of Consonants Correct (PCC), edit distance, and degree of homonymy. These metrics were related to estimated ratings of severity collected from 34 practicing clinicians. The results show an expected high correlation between the PCC and edit distance metrics, but that none of the three metrics align with clinicians’ ratings. Although these results do not generate definite answers to what phonological factors contribute the most to (un)intelligibility, this study demonstrates a methodology that allows for large-scale investigations of the interplay between phonological errors and their impact on speech in context, within and across languages.

Keywords
speech disorders, intelligibility, child speech
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-168519 (URN)2-s2.0-84910090287 (Scopus ID)
Conference
Conference of 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages
Note

QC 20150609

Available from: 2015-06-09 Created: 2015-06-04 Last updated: 2018-01-11Bibliographically approved
Al Moubayed, S., Edlund, J. & Gustafson, J. (2013). Analysis of gaze and speech patterns in three-party quiz game interaction. In: Interspeech 2013: . Paper presented at 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013. ISCA 2013 (pp. 1126-1130).
Open this publication in new window or tab >>Analysis of gaze and speech patterns in three-party quiz game interaction
2013 (English)In: Interspeech 2013, 2013, p. 1126-1130Conference paper, Published paper (Refereed)
Abstract [en]

In order to understand and model the dynamics between interaction phenomena such as gaze and speech in face-to-face multiparty interaction between humans, we need large quantities of reliable, objective data of such interactions. To date, this type of data is in short supply. We present a data collection setup using automated, objective techniques in which we capture the gaze and speech patterns of triads deeply engaged in a high-stakes quiz game. The resulting corpus consists of five one-hour recordings, and is unique in that it makes use of three state-of-the-art gaze trackers (one per subject) in combination with a state-of-theart conical microphone array designed to capture roundtable meetings. Several video channels are also included. In this paper we present the obstacles we encountered and the possibilities afforded by a synchronised, reliable combination of large-scale multi-party speech and gaze data, and an overview of the first analyses of the data. Index Terms: multimodal corpus, multiparty dialogue, gaze patterns, multiparty gaze.

National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-137388 (URN)2-s2.0-84906231582 (Scopus ID)
Conference
14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013. ISCA 2013
Note

QC 20140603

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2018-01-11Bibliographically approved
Edlund, J., Al Moubayed, S., Tånnander, C. & Gustafson, J. (2013). Audience response system based annotation of speech. In: Proceedings of Fonetik 2013: . Paper presented at XXVIth Annual Phonetics Meeting Fonetik 2013; Linköping, Sweden, 12–13 June, 2013 (pp. 13-16). Linköping: Linköping University
Open this publication in new window or tab >>Audience response system based annotation of speech
2013 (English)In: Proceedings of Fonetik 2013, Linköping: Linköping University , 2013, p. 13-16Conference paper, Published paper (Other academic)
Abstract [en]

Manual annotators are often used to label speech. The task is associated with high costs and with great time consumption. We suggest to reach an increased throughput while maintaining a high measure of experimental control by borrowing from the Audience Response Systems used in the film and television industries, and demonstrate a cost-efficient setup for rapid, plenary annotation of phenomena occurring in recorded speech together with some results from studies we have undertaken to quantify the temporal precision and reliability of such annotations.

Place, publisher, year, edition, pages
Linköping: Linköping University, 2013
Series
Studies in Language and Culture, ISSN 1403-2570 ; 21
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-137389 (URN)978-91-7519-582-7 (ISBN)978-91-7519-579-7 (ISBN)
Conference
XXVIth Annual Phonetics Meeting Fonetik 2013; Linköping, Sweden, 12–13 June, 2013
Note

tmh_import_13_12_13, tmh_id_3856

QC 20140219

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2018-01-11Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-9327-9482

Search in DiVA

Show all publications