Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Effect of vowel context in cepstral and entropy analysis of pathological voices
KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
Department of Electronics and Telecommunications, Politecnico di Torino, Italy.
KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
Department of Electronics and Telecommunications, Politecnico di Torino, Italy.ORCID-id: 0000-0002-3323-5311
Visa övriga samt affilieringar
2019 (Engelska)Ingår i: Biomedical Signal Processing and Control, ISSN 1746-8094, E-ISSN 1746-8108, Vol. 47, s. 350-357Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

This study investigates the effect of vowel context (excerpted from speech versus sustained) on two voice quality measures: the cepstral peak prominence smoothed (CPPS) and sample entropy (SampEn). Thirty-one dysphonic subjects with different types of organic dysphonia and thirty-one controls read a phonetically balanced text and phonated sustained [a:] vowels in comfortable pitch and loudness. All the [a:] vowels of the read text were excerpted by automatic speech recognition and phonetic (forced) alignment. CPPS and SampEn were calculated for all excerpted vowels of each subject, forming one distribution of CPPS and SampEn values per subject. The sustained vowels were analyzed using a 41 ms window, forming another distribution of CPPS and SampEn values per subject. Two speech-language pathologists performed a perceptual evaluation of the dysphonic subjects’ voice quality from the recorded text. The power of discriminating the dysphonic group from the controls for SampEn and CPPS was assessed for the excerpted and sustained vowels with the Receiver-Operator Characteristic (ROC) analysis. The best discrimination in terms of Area Under Curve (AUC) for CPPS occurred using the mean of the excerpted vowel distributions (AUC=0.85) and for SampEn using the 95th percentile of the sustained vowel distributions (AUC=0.84). CPPS and SampEn were found to be negatively correlated, and the largest correlation was found between the corresponding 95th percentiles of their distributions (Pearson, r=−0.83, p < 10−3). A strong correlation was also found between the 95th percentile of SampEn distributions and the perceptual quality of breathiness (Pearson, r=0.83, p < 10−3). The results suggest that depending on the acoustic voice quality measure, sustained vowels can be more effective than excerpted vowels for detecting dysphonia. Additionally, when using CPPS or SampEn there is an advantage of using the measures’ distributions rather than their average values.

Ort, förlag, år, upplaga, sidor
Elsevier, 2019. Vol. 47, s. 350-357
Nyckelord [en]
dysphonia, voice analysis, cepstral peak prominence, sample entropy, vowel context
Nationell ämneskategori
Jämförande språkvetenskap och allmän lingvistik
Forskningsämne
Tal- och musikkommunikation
Identifikatorer
URN: urn:nbn:se:kth:diva-221797DOI: 10.1016/j.bspc.2018.08.021ISI: 000449134500035Scopus ID: 2-s2.0-85053219805OAI: oai:DiVA.org:kth-221797DiVA, id: diva2:1177559
Forskningsfinansiär
Vetenskapsrådet, 2010-4565Vetenskapsrådet, 2013-0632
Anmärkning

QC 20180129

Tillgänglig från: 2018-01-25 Skapad: 2018-01-25 Senast uppdaterad: 2019-01-25Bibliografiskt granskad
Ingår i avhandling
1. Analyses of voice and glottographic signals in singing and speech
Öppna denna publikation i ny flik eller fönster >>Analyses of voice and glottographic signals in singing and speech
2018 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Recent advances in machine learning and time series analysis techniques have brought new perspectives to a great number of scientific fields. This thesis contributes applications of such techniques to voice analysis, in an attempt to extract information on the vibration of the vocal folds as such, as well as on the radiated acoustic signal. The data that was analyzed in this work are acoustic recordings, electroglottographic (EGG) signals and transnasal high- speed videoendoscopic images. The data analysis techniques are primarily based on clustering, i.e., grouping of data based on similarity, and sample entropy analysis, i.e., quantifying the degree of irregularity in a given signal. The experiments were conducted so as to provide data for different types of vibratory behaviors (or vibratory states) of the vocal folds. Clustering was used in order to categorize in an unsupervised fashion these different vi- bratory states, based solely on the electroglottographic signal, or the glottal area waveform, or both. Sample entropy was utilized as an indicator of in- stabilities, when subjects produced voiced sounds using irregular vibratory patterns, such as register breaks, intermittent diplophonia, and other types of irregularities. The prominent role of sound pressure level and fundamental frequency motivated further study of the relationship between them and the shape of the electroglottographic waveform. Graphical representations were created to visualize the relationship between different vibratory behaviors with fundamental frequency and sound pressure level. The EGG waveform shape was seen to depend strongly on sound pressure level and somewhat less on fundamental frequency. In very soft phonation, the almost sinusoidal waveform of the EGG suggests that studying the EGG using clusters may give a better representation compared to conventional time-domain metrics. The paradigm of the clustering was later applied in synchronous recordings of electroglottogram and glottal area waveforms in professional tenor singers. Different vibratory states were classified successfully using clustering, and the electroglottogram was seen to be as good as the glottal area waveform for such a classification task. The last part of this work concerns voices from subjects with organic dysphonia. A study was dedicated to investigate how vowel context (sustained versus excerpted from speech) can affect the power of quantitative acoustic measures to discriminate dysphonic subjects from controls. Two acoustic voice quality measures were used: the cepstral peak prominence (smoothed) and sample entropy. The cepstral peak prominence (smoothed) showed better discriminatory power with excerpted vowels, while sample entropy with sustained vowels. Additionally, it was found that sample entropy was strongly correlated with cepstral peak prominence (smoothed) and with the perceptual quality of breathiness. 

Ort, förlag, år, upplaga, sidor
Stockholm, Sweden: KTH Royal Institute of Technology, 2018. s. 55
Serie
TRITA-EECS-AVL ; 2018:6
Nyckelord
voice ; singing ; electroglottography ; clustering ; dysphonia ; sample entropy ;
Nationell ämneskategori
Annan naturvetenskap
Forskningsämne
Tal- och musikkommunikation
Identifikatorer
urn:nbn:se:kth:diva-221825 (URN)978-91-7729-668-3 (ISBN)
Disputation
2018-02-23, F3, Lindstedtsvägen 26, Stockholm, 13:30 (Engelska)
Opponent
Handledare
Projekt
Phonatory dynamics and states
Forskningsfinansiär
Vetenskapsrådet, 2010-4565Vetenskapsrådet, 2013-0632
Anmärkning

QC 20180126

Tillgänglig från: 2018-01-26 Skapad: 2018-01-25 Senast uppdaterad: 2018-01-26Bibliografiskt granskad

Open Access i DiVA

fulltext(2038 kB)64 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 2038 kBChecksumma SHA-512
a12d04314ef8a8052f8242b0518dac93f2ddb8a2ff4870bdb967a40850d2796a4bf7db16ea48426df19e48df4141bd08641d20966882073dd50ae7a0927ca173
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Sök vidare i DiVA

Av författaren/redaktören
Selamtzis, AndreasSalvi, GiampieroCarullo, Alessio
Av organisationen
Tal, musik och hörsel, TMH
I samma tidskrift
Biomedical Signal Processing and Control
Jämförande språkvetenskap och allmän lingvistik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 64 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 2354 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf