Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 109) Show all publications
Per, F., Malisz, Z. & Edlund, J. (2019). Bringing order to chaos: A non-sequential approach for browsing large sets of found audio data. In: LREC 2018 - 11th International Conference on Language Resources and Evaluation: . Paper presented at 11th International Conference on Language Resources and Evaluation, LREC 2018, Phoenix Seagaia Conference Center, Miyazaki, Japan, 7 May 2018 through 12 May 2018 (pp. 4307-4311). European Language Resources Association (ELRA)
Open this publication in new window or tab >>Bringing order to chaos: A non-sequential approach for browsing large sets of found audio data
2019 (English)In: LREC 2018 - 11th International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA) , 2019, p. 4307-4311Conference paper, Published paper (Refereed)
Abstract [en]

We present a novel and general approach for fast and efficient non-sequential browsing of sound in large archives that we know little or nothing about, e.g. so called found data - data not recorded with the specific purpose to be analysed or used as training data. Our main motivation is to address some of the problems speech and speech technology researchers see when they try to capitalise on the huge quantities of speech data that reside in public archives. Our method is a combination of audio browsing through massively multi-object sound environments and a well-known unsupervised dimensionality reduction algorithm (SOM). We test the process chain on four data sets of different nature (speech, speech and music, farm animals, and farm animals mixed with farm sounds). The methods are shown to combine well, resulting in rapid and readily interpretable observations. Finally, our initial results are demonstrated in prototype software which is freely available.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA), 2019
Keywords
Data visualisation, Found data, Speech archives
National Category
Media Engineering
Identifiers
urn:nbn:se:kth:diva-241799 (URN)2-s2.0-85059880464 (Scopus ID)9791095546009 (ISBN)
Conference
11th International Conference on Language Resources and Evaluation, LREC 2018, Phoenix Seagaia Conference Center, Miyazaki, Japan, 7 May 2018 through 12 May 2018
Note

QC 20190125

Available from: 2019-01-25 Created: 2019-01-25 Last updated: 2019-01-25Bibliographically approved
Clark, L., Cowan, B. R., Edwards, J., Munteanu, C., Murad, C., Aylett, M., . . . Doyle, P. (2019). Mapping Theoretical and Methodological Perspectives for Understanding Speech Interface Interactions. In: CHI EA '19 EXTENDED ABSTRACTS: EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS. Paper presented at 2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019; Glasgow; United Kingdom; 4 May 2019 through 9 May 2019. ASSOC COMPUTING MACHINERY
Open this publication in new window or tab >>Mapping Theoretical and Methodological Perspectives for Understanding Speech Interface Interactions
Show others...
2019 (English)In: CHI EA '19 EXTENDED ABSTRACTS: EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, ASSOC COMPUTING MACHINERY , 2019Conference paper, Published paper (Refereed)
Abstract [en]

The use of speech as an interaction modality has grown considerably through the integration of Intelligent Personal Assistants (IPAs- e.g. Siri, Google Assistant) into smartphones and voice based devices (e.g. Amazon Echo). However, there remain significant gaps in using theoretical frameworks to understand user behaviours and choices and how they may applied to specific speech interface interactions. This part-day multidisciplinary workshop aims to critically map out and evaluate theoretical frameworks and methodological approaches across a number of disciplines and establish directions for new paradigms in understanding speech interface user behaviour. In doing so, we will bring together participants from HCI and other speech related domains to establish a cohesive, diverse and collaborative community of researchers from academia and industry with interest in exploring theoretical and methodological issues in the field.

Place, publisher, year, edition, pages
ASSOC COMPUTING MACHINERY, 2019
Keywords
speech interface, voice user interface, theory, method, design, intelligent personal assistants
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-260230 (URN)10.1145/3290607.3299009 (DOI)000482042103089 ()2-s2.0-85067309893 (Scopus ID)
Conference
2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019; Glasgow; United Kingdom; 4 May 2019 through 9 May 2019
Note

QC 20190927

Available from: 2019-09-27 Created: 2019-09-27 Last updated: 2019-10-16Bibliographically approved
Fallgren, P., Malisz, Z. & Edlund, J. (2018). A tool for exploring large amounts of found audio data. In: CEUR Workshop Proceedings: . Paper presented at 3rd Conference on Digital Humanities in the Nordic Countries, DHN 2018, 7 March 2018 through 9 March 2018 (pp. 499-503). CEUR-WS
Open this publication in new window or tab >>A tool for exploring large amounts of found audio data
2018 (English)In: CEUR Workshop Proceedings, CEUR-WS , 2018, p. 499-503Conference paper, Published paper (Refereed)
Abstract [en]

We demonstrate a method and a set of open source tools (beta) for nonsequential browsing of large amounts of audio data. The demonstration will contain versions of a set of functionalities in their first stages, and will provide a good insight in how the method can be used to browse through large quantities of audio data efficiently.

Place, publisher, year, edition, pages
CEUR-WS, 2018
Keywords
Found data, Machine learning, Speech processing, Visualization, Flow visualization, Learning systems, Audio data, Large amounts, Nonsequential, Open source tools, Data visualization
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-227479 (URN)2-s2.0-85045345183 (Scopus ID)
Conference
3rd Conference on Digital Humanities in the Nordic Countries, DHN 2018, 7 March 2018 through 9 March 2018
Note

Conference code: 135422; Export Date: 9 May 2018; Conference Paper; Funding details: 2013-02003, TRC, The Research Council; Funding text: The project described here is funded in full by Riksbankens Jubileumsfond (SAF16-0917: 1). Its results will be made more widely accessible through the infrastructure supported by SWE-CLARIN (Swedish research Council 2013-02003). QC 20180516

Available from: 2018-05-16 Created: 2018-05-16 Last updated: 2018-10-19Bibliographically approved
Borin, L., Forsberg, M., Edlund, J. & Domeij, R. (2018). Språkbanken 2018: Research resources for text, speech, & society. In: CEUR Workshop Proceedings: . Paper presented at 3rd Conference on Digital Humanities in the Nordic Countries, DHN 2018, 7 March 2018 through 9 March 2018 (pp. 504-506). CEUR-WS
Open this publication in new window or tab >>Språkbanken 2018: Research resources for text, speech, & society
2018 (English)In: CEUR Workshop Proceedings, CEUR-WS , 2018, p. 504-506Conference paper, Published paper (Refereed)
Abstract [en]

We introduce an expanded version of the Swedish research resource Språkbanken (the Swedish Language Bank). In 2018, Språkbanken, which has supported national and international research for over four decades, adds two branches, one focusing on speech and one on societal aspects of language, to its existing organization, which targets text. 

Place, publisher, year, edition, pages
CEUR-WS, 2018
Keywords
Infrastructure, Society, Speech, Text, International researches, Swedishs
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-227481 (URN)2-s2.0-85045307620 (Scopus ID)
Conference
3rd Conference on Digital Humanities in the Nordic Countries, DHN 2018, 7 March 2018 through 9 March 2018
Note

Conference code: 135422; Export Date: 9 May 2018; Conference Paper. QC 20180516

Available from: 2018-05-16 Created: 2018-05-16 Last updated: 2018-10-19Bibliographically approved
Strömbergsson, S., Edlund, J., Götze, J. & Björkenstam, K. N. (2017). Approximating phonotactic input in children's linguistic environments from orthographic transcripts. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017: . Paper presented at 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 20 August 2017 through 24 August 2017 (pp. 2213-2217). International Speech Communication Association, 2017
Open this publication in new window or tab >>Approximating phonotactic input in children's linguistic environments from orthographic transcripts
2017 (English)In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, International Speech Communication Association , 2017, Vol. 2017, p. 2213-2217Conference paper, Published paper (Refereed)
Abstract [en]

Child-directed spoken data is the ideal source of support for claims about children's linguistic environments. However, phonological transcriptions of child-directed speech are scarce, compared to sources like adult-directed speech or text data. Acquiring reliable descriptions of children's phonological environments from more readily accessible sources would mean considerable savings of time and money. The first step towards this goal is to quantify the reliability of descriptions derived from such secondary sources. We investigate how phonological distributions vary across different modalities (spoken vs. written), and across the age of the intended audience (children vs. adults). Using a previously unseen collection of Swedish adult-and child-directed spoken and written data, we combine lexicon look-up and graphemeto-phoneme conversion to approximate phonological characteristics. The analysis shows distributional differences across datasets both for single phonemes and for longer phoneme sequences. Some of these are predictably attributed to lexical and contextual characteristics of text vs. speech. The generated phonological transcriptions are remarkably reliable. The differences in phonological distributions between child-directed speech and secondary sources highlight a need for compensatory measures when relying on written data or on adult-directed spoken data, and/or for continued collection of actual child-directed speech in research on children's language environments.

Place, publisher, year, edition, pages
International Speech Communication Association, 2017
Keywords
Grapheme-To-Phoneme Conversion, language Acquisition, Phonology
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-222093 (URN)10.21437/Interspeech.2017-1634 (DOI)000457505000462 ()2-s2.0-85039166025 (Scopus ID)
Conference
18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 20 August 2017 through 24 August 2017
Note

QC 20180131

Available from: 2018-01-31 Created: 2018-01-31 Last updated: 2019-09-24Bibliographically approved
Edlund, J. & Gustafson, J. (2016). Hidden resources - Strategies to acquire and exploit potential spoken language resources in national archives. In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016: . Paper presented at 10th International Conference on Language Resources and Evaluation, LREC 2016, 23 May 2016 through 28 May 2016 (pp. 4531-4534). European Language Resources Association (ELRA)
Open this publication in new window or tab >>Hidden resources - Strategies to acquire and exploit potential spoken language resources in national archives
2016 (English)In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, European Language Resources Association (ELRA) , 2016, p. 4531-4534Conference paper, Published paper (Refereed)
Abstract [en]

In 2014, the Swedish government tasked a Swedish agency, The Swedish Post and Telecom Authority (PTS), with investigating how to best create and populate an infrastructure for spoken language resources (Ref N2014/2840/ITP). As a part of this work, the department of Speech, Music and Hearing at KTH Royal Institute of Technology have taken inventory of existing potential spoken language resources, mainly in Swedish national archives and other governmental or public institutions. In this position paper, key priorities, perspectives, and strategies that may be of general, rather than Swedish, interest are presented. We discuss broad types of potential spoken language resources available; to what extent these resources are free to use; and thirdly the main contribution: strategies to ensure the continuous acquisition of spoken language resources in a manner that facilitates speech and speech technology research.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA), 2016
Keywords
National archives, Oral history, Speech, Hidden resources, Public institution, Royal Institute of Technology, Speech technology, Spoken languages, Swedish government, Audition
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-222949 (URN)2-s2.0-85037133240 (Scopus ID)9782951740891 (ISBN)
Conference
10th International Conference on Language Resources and Evaluation, LREC 2016, 23 May 2016 through 28 May 2016
Note

QC 20180327

Available from: 2018-03-27 Created: 2018-03-27 Last updated: 2018-05-24Bibliographically approved
Edlund, J., Tånnander, C. & Gustafson, J. (2015). Audience response system-based assessment for analysis-by-synthesis. In: Proc. of ICPhS 2015: . Paper presented at ICPhS 2015. ICPhS
Open this publication in new window or tab >>Audience response system-based assessment for analysis-by-synthesis
2015 (English)In: Proc. of ICPhS 2015, ICPhS , 2015Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
ICPhS, 2015
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180399 (URN)
Conference
ICPhS 2015
Note

QC 20160317

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
Edlund, J., Heldner, M. & Wlodarczak, M. (2014). Catching wind of multiparty conversation. In: LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION: . Paper presented at 9th International Conference on Language Resources and Evaluation (LREC), MAY 26-31, 2014, Reykjavik, ICELAND.
Open this publication in new window or tab >>Catching wind of multiparty conversation
2014 (English)In: LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014Conference paper, Published paper (Refereed)
Abstract [en]

The paper describes the design of a novel corpus of respiratory activity in spontaneous multiparty face-to-face conversations in Swedish. The corpus is collected with the primary goal of investigating the role of breathing for interactive control of interaction. Physiological correlates of breathing are captured by means of respiratory belts, which measure changes in cross sectional area of the rib cage and the abdomen. Additionally, auditory and visual cues of breathing are recorded in parallel to the actual conversations. The corpus allows studying respiratory mechanisms underlying organisation of spontaneous communication, especially in connection with turn management. As such, it is a valuable resource both for fundamental research and speech techonology applications.

Keywords
breathing, multiparty conversation, turn-taking, respiratory inductance plethysmography, physiological measurements
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-173466 (URN)000355611001010 ()978-2-9517408-8-4 (ISBN)
Conference
9th International Conference on Language Resources and Evaluation (LREC), MAY 26-31, 2014, Reykjavik, ICELAND
Note

QC 20150921

Available from: 2015-09-21 Created: 2015-09-11 Last updated: 2018-01-11Bibliographically approved
Edlund, J., Edelstam, F. & Gustafson, J. (2014). Human pause and resume behaviours for unobtrusive humanlike in-car spoken dialogue systems. In: Proceedings of the of the EACL 2014 Workshop on Dialogue in Motion (DM): . Paper presented at the EACL Satellite Workshop Dialogue In Motion (DIM-2014), Gothenburg, Sweden, April 26-30 2014 (pp. 73-77). Gothenburg, Sweden
Open this publication in new window or tab >>Human pause and resume behaviours for unobtrusive humanlike in-car spoken dialogue systems
2014 (English)In: Proceedings of the of the EACL 2014 Workshop on Dialogue in Motion (DM), Gothenburg, Sweden, 2014, p. 73-77Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a first, largely qualitative analysis of a set of human-human dialogues recorded specifically to provide insights in how humans handle pauses and resumptions in situations where the speakers cannot see each other, but have to rely on the acoustic signal alone. The work presented is part of a larger effort to find unobtrusive human dialogue behaviours that can be mimicked and implemented in-car spoken dialogue systems within in the EU project Get Home Safe, a collaboration between KTH, DFKI, Nuance, IBM and Daimler aiming to find ways of driver interaction that minimizes safety issues,. The analysis reveals several human temporal, semantic/pragmatic, and structural behaviours that are good candidates for inclusion in spoken dialogue systems.

Place, publisher, year, edition, pages
Gothenburg, Sweden: , 2014
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-158156 (URN)
Conference
the EACL Satellite Workshop Dialogue In Motion (DIM-2014), Gothenburg, Sweden, April 26-30 2014
Note

tmh_import_14_12_30, tmh_id_3926. QC 20150218

Available from: 2014-12-30 Created: 2014-12-30 Last updated: 2018-01-11Bibliographically approved
Strömbergsson, S., Tånnander, C. & Edlund, J. (2014). Ranking severity of speech errors by their phonological impact in context. Paper presented at Conference of 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages. Interspeech, 1568-1572
Open this publication in new window or tab >>Ranking severity of speech errors by their phonological impact in context
2014 (English)In: Interspeech, ISSN 2308-457X, p. 1568-1572Article in journal (Refereed) Published
Abstract [en]

Children with speech disorders often present with systematic speech error patterns. In clinical assessments of speech disorders, evaluating the severity of the disorder is central. Current measures of severity have limited sensitivity to factors like the frequency of the target sounds in the child’s language and the degree of phonological diversity, which are factors that can be assumed to affect intelligibility. By constructing phonological filters to simulate eight speech error patterns often observed in children, and applying these filters to a phonologically transcribed corpus of 350K words, this study explores three quantitative measures of phonological impact: Percentage of Consonants Correct (PCC), edit distance, and degree of homonymy. These metrics were related to estimated ratings of severity collected from 34 practicing clinicians. The results show an expected high correlation between the PCC and edit distance metrics, but that none of the three metrics align with clinicians’ ratings. Although these results do not generate definite answers to what phonological factors contribute the most to (un)intelligibility, this study demonstrates a methodology that allows for large-scale investigations of the interplay between phonological errors and their impact on speech in context, within and across languages.

Keywords
speech disorders, intelligibility, child speech
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-168519 (URN)2-s2.0-84910090287 (Scopus ID)
Conference
Conference of 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages
Note

QC 20150609

Available from: 2015-06-09 Created: 2015-06-04 Last updated: 2018-01-11Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-9327-9482

Search in DiVA

Show all publications