kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Wavebender GAN: An architecture for phonetically meaningful speech manipulation
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-1643-1054
2022 (English)In: 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE conference proceedings, 2022Conference paper, Published paper (Refereed)
Abstract [en]

Deep learning has revolutionised synthetic speech quality. However, it has thus far delivered little value to the speech science community. The new methods do not meet the controllability demands that practitioners in this area require e.g.: in listening tests with manipulated speech stimuli. Instead, control of different speech properties in such stimuli is achieved by using legacy signal-processing methods. This limits the range, accuracy, and speech quality of the manipulations. Also, audible artefacts have a negative impact on the methodological validity of results in speech perception studies.This work introduces a system capable of manipulating speech properties through learning rather than design. The architecture learns to control arbitrary speech properties and leverages progress in neural vocoders to obtain realistic output. Experiments with copy synthesis and manipulation of a small set of core speech features (pitch, formants, and voice quality measures) illustrate the promise of the approach for producing speech stimuli that have accurate control and high perceptual quality.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2022.
Series
International Conference on Acoustics Speech and Signal Processing ICASSP, ISSN 1520-6149
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-313455DOI: 10.1109/ICASSP43922.2022.9747442ISI: 000864187906095Scopus ID: 2-s2.0-85131238464OAI: oai:DiVA.org:kth-313455DiVA, id: diva2:1663936
Conference
47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), MAY 22-27, 2022, Singapore, SINGAPORE
Note

Part of proceedings: ISBN 978-1-6654-0540-9

QC 20220607

Available from: 2022-06-03 Created: 2022-06-03 Last updated: 2024-03-15Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopushttps://arxiv.org/abs/2202.10973Conference webpage

Authority records

Beck, GustavoWennberg, UlmeMalisz, ZofiaHenter, Gustav Eje

Search in DiVA

By author/editor
Beck, GustavoWennberg, UlmeMalisz, ZofiaHenter, Gustav Eje
By organisation
Speech, Music and Hearing, TMH
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 176 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf