Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 112) Show all publications
Per, F., Malisz, Z. & Edlund, J. (2019). Bringing order to chaos: A non-sequential approach for browsing large sets of found audio data. In: LREC 2018 - 11th International Conference on Language Resources and Evaluation: . Paper presented at 11th International Conference on Language Resources and Evaluation, LREC 2018, Phoenix Seagaia Conference Center, Miyazaki, Japan, 7 May 2018 through 12 May 2018 (pp. 4307-4311). European Language Resources Association (ELRA)
Open this publication in new window or tab >>Bringing order to chaos: A non-sequential approach for browsing large sets of found audio data
2019 (English)In: LREC 2018 - 11th International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA) , 2019, p. 4307-4311Conference paper, Published paper (Refereed)
Abstract [en]

We present a novel and general approach for fast and efficient non-sequential browsing of sound in large archives that we know little or nothing about, e.g. so called found data - data not recorded with the specific purpose to be analysed or used as training data. Our main motivation is to address some of the problems speech and speech technology researchers see when they try to capitalise on the huge quantities of speech data that reside in public archives. Our method is a combination of audio browsing through massively multi-object sound environments and a well-known unsupervised dimensionality reduction algorithm (SOM). We test the process chain on four data sets of different nature (speech, speech and music, farm animals, and farm animals mixed with farm sounds). The methods are shown to combine well, resulting in rapid and readily interpretable observations. Finally, our initial results are demonstrated in prototype software which is freely available.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA), 2019
Keywords
Data visualisation, Found data, Speech archives
National Category
Media Engineering
Identifiers
urn:nbn:se:kth:diva-241799 (URN)2-s2.0-85059880464 (Scopus ID)9791095546009 (ISBN)
Conference
11th International Conference on Language Resources and Evaluation, LREC 2018, Phoenix Seagaia Conference Center, Miyazaki, Japan, 7 May 2018 through 12 May 2018
Note

QC 20190125

Available from: 2019-01-25 Created: 2019-01-25 Last updated: 2019-01-25Bibliographically approved
Per, F., Malisz, Z. & Edlund, J. (2019). How to annotate 100 hours in 45 minutes. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH: . Paper presented at Interspeech 2019 15-19 September 2019, Graz (pp. 341-345). ISCA
Open this publication in new window or tab >>How to annotate 100 hours in 45 minutes
2019 (English)In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISCA , 2019, p. 341-345Conference paper, Published paper (Refereed)
Abstract [en]

Speech data found in the wild hold many advantages over artificially constructed speech corpora in terms of ecological validity and cultural worth. Perhaps most importantly, there is a lot of it. However, the combination of great quantity, noisiness and variation poses a challenge for its access and processing. Generally speaking, automatic approaches to tackle the problem require good labels for training, while manual approaches require time. In this study, we provide further evidence for a semi-supervised, human-in-the-loop framework that previously has shown promising results for browsing and annotating large quantities of found audio data quickly. The findings of this study show that a 100-hour long subset of the Fearless Steps corpus can be annotated for speech activity in less than 45 minutes, a fraction of the time it would take traditional annotation methods, without a loss in performance.

Place, publisher, year, edition, pages
ISCA, 2019
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-268304 (URN)10.21437/Interspeech.2019-1648 (DOI)2-s2.0-85074718085 (Scopus ID)
Conference
Interspeech 2019 15-19 September 2019, Graz
Note

QC 20200310

Available from: 2020-03-10 Created: 2020-03-10 Last updated: 2020-03-10Bibliographically approved
Clark, L., Cowan, B. R., Edwards, J., Munteanu, C., Murad, C., Aylett, M., . . . Doyle, P. (2019). Mapping Theoretical and Methodological Perspectives for Understanding Speech Interface Interactions. In: CHI EA '19 EXTENDED ABSTRACTS: EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS. Paper presented at 2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019; Glasgow; United Kingdom; 4 May 2019 through 9 May 2019. ASSOC COMPUTING MACHINERY
Open this publication in new window or tab >>Mapping Theoretical and Methodological Perspectives for Understanding Speech Interface Interactions
Show others...
2019 (English)In: CHI EA '19 EXTENDED ABSTRACTS: EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, ASSOC COMPUTING MACHINERY , 2019Conference paper, Published paper (Refereed)
Abstract [en]

The use of speech as an interaction modality has grown considerably through the integration of Intelligent Personal Assistants (IPAs- e.g. Siri, Google Assistant) into smartphones and voice based devices (e.g. Amazon Echo). However, there remain significant gaps in using theoretical frameworks to understand user behaviours and choices and how they may applied to specific speech interface interactions. This part-day multidisciplinary workshop aims to critically map out and evaluate theoretical frameworks and methodological approaches across a number of disciplines and establish directions for new paradigms in understanding speech interface user behaviour. In doing so, we will bring together participants from HCI and other speech related domains to establish a cohesive, diverse and collaborative community of researchers from academia and industry with interest in exploring theoretical and methodological issues in the field.

Place, publisher, year, edition, pages
ASSOC COMPUTING MACHINERY, 2019
Keywords
speech interface, voice user interface, theory, method, design, intelligent personal assistants
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-260230 (URN)10.1145/3290607.3299009 (DOI)000482042103089 ()2-s2.0-85067309893 (Scopus ID)
Conference
2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019; Glasgow; United Kingdom; 4 May 2019 through 9 May 2019
Note

QC 20190927

Available from: 2019-09-27 Created: 2019-09-27 Last updated: 2020-03-13Bibliographically approved
Wagner, P., Beskow, J., Betz, S., Edlund, J., Gustafson, J., Henter, G. E., . . . Tånnander, C. (2019). Speech Synthesis Evaluation—State-of-the-Art Assessment and Suggestion for a Novel Research Program. In: Proceedings of the 10th Speech Synthesis Workshop (SSW10): . Paper presented at 10th Speech Synthesis Workshop (SSW10).
Open this publication in new window or tab >>Speech Synthesis Evaluation—State-of-the-Art Assessment and Suggestion for a Novel Research Program
Show others...
2019 (English)In: Proceedings of the 10th Speech Synthesis Workshop (SSW10), 2019Conference paper, Published paper (Refereed)
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-268347 (URN)
Conference
10th Speech Synthesis Workshop (SSW10)
Available from: 2020-02-18 Created: 2020-02-18 Last updated: 2020-02-18
Clark, L., Doyle, P., Garaialde, D., Gilmartin, E., Schloegl, S., Edlund, J., . . . Cowan, B. R. (2019). The State of Speech in HCI: Trends, Themes and Challenges. Interacting with computers, 31(4), 349-371
Open this publication in new window or tab >>The State of Speech in HCI: Trends, Themes and Challenges
Show others...
2019 (English)In: Interacting with computers, ISSN 0953-5438, E-ISSN 1873-7951, Vol. 31, no 4, p. 349-371Article in journal (Refereed) Published
Abstract [en]

Speech interfaces are growing in popularity. Through a review of 99 research papers this work maps the trends, themes, findings and methods of empirical research on speech interfaces in the field of human-computer interaction (HCI). We find that studies are usability/theory-focused or explore wider system experiences, evaluating Wizard of Oz, prototypes or developed systems. Measuring task and interaction was common, as was using self-report questionnaires to measure concepts like usability and user attitudes. A thematic analysis of the research found that speech HCI work focuses on nine key topics: system speech production, design insight, modality comparison, experiences with interactive voice response systems, assistive technology and accessibility, user speech production, using speech technology for development, peoples' experiences with intelligent personal assistants and how user memory affects speech interface interaction. From these insights we identify gaps and challenges in speech research, notably taking into account technological advancements, the need to develop theories of speech interface interaction, grow critical mass in this domain, increase design work and expand research from single to multiple user interaction contexts so as to reflect current use contexts. We also highlight the need to improve measure reliability, validity and consistency, in the wild deployment and reduce barriers to building fully functional speech interfaces for research.

Place, publisher, year, edition, pages
OXFORD UNIV PRESS, 2019
Keywords
speech interfaces, speech HCI, review, speech technology, voice user interfaces, intelligent personal assistants
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-269509 (URN)10.1093/iwc/iwz016 (DOI)000515084300001 ()
Note

QC 20200309

Available from: 2020-03-09 Created: 2020-03-09 Last updated: 2020-03-09Bibliographically approved
Fallgren, P., Malisz, Z. & Edlund, J. (2018). A tool for exploring large amounts of found audio data. In: CEUR Workshop Proceedings: . Paper presented at 3rd Conference on Digital Humanities in the Nordic Countries, DHN 2018, 7 March 2018 through 9 March 2018 (pp. 499-503). CEUR-WS
Open this publication in new window or tab >>A tool for exploring large amounts of found audio data
2018 (English)In: CEUR Workshop Proceedings, CEUR-WS , 2018, p. 499-503Conference paper, Published paper (Refereed)
Abstract [en]

We demonstrate a method and a set of open source tools (beta) for nonsequential browsing of large amounts of audio data. The demonstration will contain versions of a set of functionalities in their first stages, and will provide a good insight in how the method can be used to browse through large quantities of audio data efficiently.

Place, publisher, year, edition, pages
CEUR-WS, 2018
Keywords
Found data, Machine learning, Speech processing, Visualization, Flow visualization, Learning systems, Audio data, Large amounts, Nonsequential, Open source tools, Data visualization
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-227479 (URN)2-s2.0-85045345183 (Scopus ID)
Conference
3rd Conference on Digital Humanities in the Nordic Countries, DHN 2018, 7 March 2018 through 9 March 2018
Note

Conference code: 135422; Export Date: 9 May 2018; Conference Paper; Funding details: 2013-02003, TRC, The Research Council; Funding text: The project described here is funded in full by Riksbankens Jubileumsfond (SAF16-0917: 1). Its results will be made more widely accessible through the infrastructure supported by SWE-CLARIN (Swedish research Council 2013-02003). QC 20180516

Available from: 2018-05-16 Created: 2018-05-16 Last updated: 2018-10-19Bibliographically approved
Borin, L., Forsberg, M., Edlund, J. & Domeij, R. (2018). Språkbanken 2018: Research resources for text, speech, & society. In: CEUR Workshop Proceedings: . Paper presented at 3rd Conference on Digital Humanities in the Nordic Countries, DHN 2018, 7 March 2018 through 9 March 2018 (pp. 504-506). CEUR-WS
Open this publication in new window or tab >>Språkbanken 2018: Research resources for text, speech, & society
2018 (English)In: CEUR Workshop Proceedings, CEUR-WS , 2018, p. 504-506Conference paper, Published paper (Refereed)
Abstract [en]

We introduce an expanded version of the Swedish research resource Språkbanken (the Swedish Language Bank). In 2018, Språkbanken, which has supported national and international research for over four decades, adds two branches, one focusing on speech and one on societal aspects of language, to its existing organization, which targets text. 

Place, publisher, year, edition, pages
CEUR-WS, 2018
Keywords
Infrastructure, Society, Speech, Text, International researches, Swedishs
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-227481 (URN)2-s2.0-85045307620 (Scopus ID)
Conference
3rd Conference on Digital Humanities in the Nordic Countries, DHN 2018, 7 March 2018 through 9 March 2018
Note

Conference code: 135422; Export Date: 9 May 2018; Conference Paper. QC 20180516

Available from: 2018-05-16 Created: 2018-05-16 Last updated: 2018-10-19Bibliographically approved
Strömbergsson, S., Edlund, J., Götze, J. & Björkenstam, K. N. (2017). Approximating phonotactic input in children's linguistic environments from orthographic transcripts. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017: . Paper presented at 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 20 August 2017 through 24 August 2017 (pp. 2213-2217). International Speech Communication Association, 2017
Open this publication in new window or tab >>Approximating phonotactic input in children's linguistic environments from orthographic transcripts
2017 (English)In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, International Speech Communication Association , 2017, Vol. 2017, p. 2213-2217Conference paper, Published paper (Refereed)
Abstract [en]

Child-directed spoken data is the ideal source of support for claims about children's linguistic environments. However, phonological transcriptions of child-directed speech are scarce, compared to sources like adult-directed speech or text data. Acquiring reliable descriptions of children's phonological environments from more readily accessible sources would mean considerable savings of time and money. The first step towards this goal is to quantify the reliability of descriptions derived from such secondary sources. We investigate how phonological distributions vary across different modalities (spoken vs. written), and across the age of the intended audience (children vs. adults). Using a previously unseen collection of Swedish adult-and child-directed spoken and written data, we combine lexicon look-up and graphemeto-phoneme conversion to approximate phonological characteristics. The analysis shows distributional differences across datasets both for single phonemes and for longer phoneme sequences. Some of these are predictably attributed to lexical and contextual characteristics of text vs. speech. The generated phonological transcriptions are remarkably reliable. The differences in phonological distributions between child-directed speech and secondary sources highlight a need for compensatory measures when relying on written data or on adult-directed spoken data, and/or for continued collection of actual child-directed speech in research on children's language environments.

Place, publisher, year, edition, pages
International Speech Communication Association, 2017
Keywords
Grapheme-To-Phoneme Conversion, language Acquisition, Phonology
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-222093 (URN)10.21437/Interspeech.2017-1634 (DOI)000457505000462 ()2-s2.0-85039166025 (Scopus ID)
Conference
18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 20 August 2017 through 24 August 2017
Note

QC 20180131

Available from: 2018-01-31 Created: 2018-01-31 Last updated: 2019-09-24Bibliographically approved
Edlund, J. & Gustafson, J. (2016). Hidden resources - Strategies to acquire and exploit potential spoken language resources in national archives. In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016: . Paper presented at 10th International Conference on Language Resources and Evaluation, LREC 2016, 23 May 2016 through 28 May 2016 (pp. 4531-4534). European Language Resources Association (ELRA)
Open this publication in new window or tab >>Hidden resources - Strategies to acquire and exploit potential spoken language resources in national archives
2016 (English)In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, European Language Resources Association (ELRA) , 2016, p. 4531-4534Conference paper, Published paper (Refereed)
Abstract [en]

In 2014, the Swedish government tasked a Swedish agency, The Swedish Post and Telecom Authority (PTS), with investigating how to best create and populate an infrastructure for spoken language resources (Ref N2014/2840/ITP). As a part of this work, the department of Speech, Music and Hearing at KTH Royal Institute of Technology have taken inventory of existing potential spoken language resources, mainly in Swedish national archives and other governmental or public institutions. In this position paper, key priorities, perspectives, and strategies that may be of general, rather than Swedish, interest are presented. We discuss broad types of potential spoken language resources available; to what extent these resources are free to use; and thirdly the main contribution: strategies to ensure the continuous acquisition of spoken language resources in a manner that facilitates speech and speech technology research.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA), 2016
Keywords
National archives, Oral history, Speech, Hidden resources, Public institution, Royal Institute of Technology, Speech technology, Spoken languages, Swedish government, Audition
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-222949 (URN)2-s2.0-85037133240 (Scopus ID)9782951740891 (ISBN)
Conference
10th International Conference on Language Resources and Evaluation, LREC 2016, 23 May 2016 through 28 May 2016
Note

QC 20180327

Available from: 2018-03-27 Created: 2018-03-27 Last updated: 2018-05-24Bibliographically approved
Edlund, J., Tånnander, C. & Gustafson, J. (2015). Audience response system-based assessment for analysis-by-synthesis. In: Proc. of ICPhS 2015: . Paper presented at ICPhS 2015. ICPhS
Open this publication in new window or tab >>Audience response system-based assessment for analysis-by-synthesis
2015 (English)In: Proc. of ICPhS 2015, ICPhS , 2015Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
ICPhS, 2015
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-180399 (URN)
Conference
ICPhS 2015
Note

QC 20160317

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-9327-9482

Search in DiVA

Show all publications