kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A processing framework to access large quantities of whispered speech found in ASMR
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-6166-9061
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-1643-1054
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0001-5953-7310
2023 (English)In: ICASSP 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes, Greece: IEEE Signal Processing Society, 2023Conference paper, Published paper (Refereed)
Abstract [en]

Whispering is a ubiquitous mode of communication that humansuse daily. Despite this, whispered speech has been poorly servedby existing speech technology due to a shortage of resources andprocessing methodology. To remedy this, this paper provides a pro-cessing framework that enables access to large and unique data ofhigh-quality whispered speech. We obtain the data from recordingssubmitted to online platforms as part of the ASMR media-culturalphenomenon. We describe our processing pipeline and a method forimproved whispered activity detection (WAD) in the ASMR data.To efficiently obtain labelled, clean whispered speech, we comple-ment the automatic WAD by using Edyson, a bulk audio annotationtool with human-in-the-loop. We also tackle a problem particular toASMR: separation of whisper from other acoustic triggers presentin the genre. We show that the proposed WAD and the efficient la-belling allows to build extensively augmented data and train a clas-sifier that extracts clean whisper segments from ASMR audio.Our large and growing dataset enables whisper-capable, data-driven speech technology and linguistic analysis. It also opens op-portunities in e.g. HCI as a resource that may elicit emotional, psy-chological and neuro-physiological responses in the listener.

Place, publisher, year, edition, pages
Rhodes, Greece: IEEE Signal Processing Society, 2023.
Keywords [en]
Whispered speech, WAD, human-in-the-loop, autonomous sensory meridian response
National Category
Signal Processing
Research subject
Information and Communication Technology; Human-computer Interaction; Computer Science; Speech and Music Communication
Identifiers
URN: urn:nbn:se:kth:diva-328771DOI: 10.1109/ICASSP49357.2023.10095965Scopus ID: 2-s2.0-85177548955OAI: oai:DiVA.org:kth-328771DiVA, id: diva2:1777252
Conference
ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4-10 June 2023
Projects
Multimodal encoding of prosodic prominence in voiced and whispered speech
Funder
Swedish Research Council, 2017-02861Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20230630

Available from: 2023-06-29 Created: 2023-06-29 Last updated: 2023-11-29Bibliographically approved

Open Access in DiVA

fulltext(309 kB)100 downloads
File information
File name FULLTEXT02.pdfFile size 309 kBChecksum SHA-512
1246754b22840b542b0855847a6bd700733abd3b46434ede1769eb108308665c8c6e8936b95ff5be272f0646dd488762751b7799c4606a20de0ac8e99bb06496
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopushttps://ieeexplore.ieee.org/document/10095965

Authority records

Pérez Zarazaga, PabloHenter, Gustav EjeMalisz, Zofia

Search in DiVA

By author/editor
Pérez Zarazaga, PabloHenter, Gustav EjeMalisz, Zofia
By organisation
Speech, Music and Hearing, TMH
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 100 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 177 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf