kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Mapping Phonation Types by Clustering of Multiple Metrics
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH. (Music Acoustics)ORCID iD: 0000-0003-0700-7216
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH. (Music Acoustics)ORCID iD: 0000-0002-3362-7518
2022 (English)In: Applied Sciences, ISSN 2076-3417, Vol. 12, no 23, p. 12092-Article in journal (Refereed) Published
Abstract [en]

For voice analysis, much work has been undertaken with a multitude of acoustic and electroglottographic metrics. However, few of these have proven to be robustly correlated with physical and physiological phenomena. In particular, all metrics are affected by the fundamental frequency and sound level, making voice assessment sensitive to the recording protocol. It was investigated whether combinations of metrics, acquired over voice maps rather than with individual sustained vowels, can offer a more functional and comprehensive interpretation. For this descriptive, retrospective study, 13 men, 13 women, and 22 children were instructed to phonate on /a/ over their full voice range. Six acoustic and EGG signal features were obtained for every phonatory cycle. An unsupervised voice classification model created feature clusters, which were then displayed on voice maps. It was found that the feature clusters may be readily interpreted in terms of phonation types. For example, the typical intense voice has a high peak EGG derivative, a relatively high contact quotient, low EGG cycle-rate entropy, and a high cepstral peak prominence in the voice signal, all represented by one cluster centroid that is mapped to a given color. In a transition region between the non-contacting and contacting of the vocal folds, the combination of metrics shows a low contact quotient and relatively high entropy, which can be mapped to a different color. Based on this data set, male phonation types could be clustered into up to six categories and female and child types into four. Combining acoustic and EGG metrics resolved more categories than either kind on their own. The inter- and intra-participant distributional features are discussed.

Place, publisher, year, edition, pages
MDPI AG , 2022. Vol. 12, no 23, p. 12092-
Keywords [en]
voice analysis, voice range profile, clustering, phonation, phonation type
National Category
Medical Laboratory Technologies Otorhinolaryngology
Research subject
Speech and Music Communication
Identifiers
URN: urn:nbn:se:kth:diva-322053DOI: 10.3390/app122312092ISI: 000910824700001Scopus ID: 2-s2.0-85142534999OAI: oai:DiVA.org:kth-322053DiVA, id: diva2:1714260
Funder
KTH Royal Institute of Technology, CSC-2020-2009
Note

QC 20230214

Available from: 2022-11-29 Created: 2022-11-29 Last updated: 2025-02-09Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopushttps://www.mdpi.com/2076-3417/12/23/12092

Authority records

Cai, HuanchenTernström, Sten

Search in DiVA

By author/editor
Cai, HuanchenTernström, Sten
By organisation
Speech, Music and Hearing, TMH
Medical Laboratory TechnologiesOtorhinolaryngology

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 218 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf