Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Prediction of three articulatory categories in vocal sound imitations using models for auditory receptive fields
KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-2926-6518
KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).ORCID iD: 0000-0002-9081-2170
KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-3511-023X
Show others and affiliations
2018 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 144, no 3, p. 1467-1483Article in journal (Refereed) Published
Abstract [en]

Vocal sound imitations provide a new challenge for understanding the coupling between articulatory mechanisms and the resulting audio. In this study, we have modeled the classification of three articulatory categories, phonation, supraglottal myoelastic vibrations, and turbulence from audio recordings. Two data sets were assembled, consisting of different vocal imitations by four professional imitators and four non-professional speakers in two different experiments. The audio data were manually annotated by two experienced phoneticians using a detailed articulatory description scheme. A separate set of audio features was developed specifically for each category using both time-domain and spectral methods. For all time-frequency transformations, and for some secondary processing, the recently developed Auditory Receptive Fields Toolbox was used. Three different machine learning methods were applied for predicting the final articulatory categories. The result with the best generalization was found using an ensemble of multilayer perceptrons. The cross-validated classification accuracy was 96.8 % for phonation, 90.8 % for supraglottal myoelastic vibrations, and 89.0 % for turbulence using all the 84 developed features. A final feature reduction to 22 features yielded similar results.

Place, publisher, year, edition, pages
Acoustical Society of America (ASA), 2018. Vol. 144, no 3, p. 1467-1483
Keywords [en]
vocal articulation, sound imitations, signal processing, auditory receptive fields, turbulence, phonation, supraglottal myoelastic vibration, partial least-square regression, support vector classification, ensemble learning
National Category
Signal Processing Computer and Information Sciences
Research subject
Speech and Music Communication
Identifiers
URN: urn:nbn:se:kth:diva-234295DOI: 10.1121/1.5052438ISI: 000457802200049Scopus ID: 2-s2.0-85053873907OAI: oai:DiVA.org:kth-234295DiVA, id: diva2:1245861
Funder
EU, FP7, Seventh Framework Programme, 618067
Note

QC 20181003

Available from: 2018-09-06 Created: 2018-09-06 Last updated: 2019-02-22Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopushttps://doi.org/10.1121/1.5052438

Authority records BETA

Lindeberg, TonyTernström, Sten

Search in DiVA

By author/editor
Friberg, AndersLindeberg, TonyHellwagner, MartinHelgason, PéturSalomão, Gláucia LaísElovsson, AndersLemaitre, GuillaumeTernström, Sten
By organisation
Speech, Music and Hearing, TMHComputational Science and Technology (CST)
In the same journal
Journal of the Acoustical Society of America
Signal ProcessingComputer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 318 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf