Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Analyses of voice and glottographic signals in singing and speech
KTH, Skolan för elektroteknik och datavetenskap (EECS). (TMH)
2018 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Recent advances in machine learning and time series analysis techniques have brought new perspectives to a great number of scientific fields. This thesis contributes applications of such techniques to voice analysis, in an attempt to extract information on the vibration of the vocal folds as such, as well as on the radiated acoustic signal. The data that was analyzed in this work are acoustic recordings, electroglottographic (EGG) signals and transnasal high- speed videoendoscopic images. The data analysis techniques are primarily based on clustering, i.e., grouping of data based on similarity, and sample entropy analysis, i.e., quantifying the degree of irregularity in a given signal. The experiments were conducted so as to provide data for different types of vibratory behaviors (or vibratory states) of the vocal folds. Clustering was used in order to categorize in an unsupervised fashion these different vi- bratory states, based solely on the electroglottographic signal, or the glottal area waveform, or both. Sample entropy was utilized as an indicator of in- stabilities, when subjects produced voiced sounds using irregular vibratory patterns, such as register breaks, intermittent diplophonia, and other types of irregularities. The prominent role of sound pressure level and fundamental frequency motivated further study of the relationship between them and the shape of the electroglottographic waveform. Graphical representations were created to visualize the relationship between different vibratory behaviors with fundamental frequency and sound pressure level. The EGG waveform shape was seen to depend strongly on sound pressure level and somewhat less on fundamental frequency. In very soft phonation, the almost sinusoidal waveform of the EGG suggests that studying the EGG using clusters may give a better representation compared to conventional time-domain metrics. The paradigm of the clustering was later applied in synchronous recordings of electroglottogram and glottal area waveforms in professional tenor singers. Different vibratory states were classified successfully using clustering, and the electroglottogram was seen to be as good as the glottal area waveform for such a classification task. The last part of this work concerns voices from subjects with organic dysphonia. A study was dedicated to investigate how vowel context (sustained versus excerpted from speech) can affect the power of quantitative acoustic measures to discriminate dysphonic subjects from controls. Two acoustic voice quality measures were used: the cepstral peak prominence (smoothed) and sample entropy. The cepstral peak prominence (smoothed) showed better discriminatory power with excerpted vowels, while sample entropy with sustained vowels. Additionally, it was found that sample entropy was strongly correlated with cepstral peak prominence (smoothed) and with the perceptual quality of breathiness. 

sted, utgiver, år, opplag, sider
Stockholm, Sweden: KTH Royal Institute of Technology, 2018. , s. 55
Serie
TRITA-EECS-AVL ; 2018:6
Emneord [en]
voice ; singing ; electroglottography ; clustering ; dysphonia ; sample entropy ;
HSV kategori
Forskningsprogram
Tal- och musikkommunikation
Identifikatorer
URN: urn:nbn:se:kth:diva-221825ISBN: 978-91-7729-668-3 (tryckt)OAI: oai:DiVA.org:kth-221825DiVA, id: diva2:1177710
Disputas
2018-02-23, F3, Lindstedtsvägen 26, Stockholm, 13:30 (engelsk)
Opponent
Veileder
Prosjekter
Phonatory dynamics and states
Forskningsfinansiär
Swedish Research Council, 2010-4565Swedish Research Council, 2013-0632
Merknad

QC 20180126

Tilgjengelig fra: 2018-01-26 Laget: 2018-01-25 Sist oppdatert: 2018-01-26bibliografisk kontrollert
Delarbeid
1. Analysis of vibratory states in phonation using spectral features of the electroglottographic signal
Åpne denne publikasjonen i ny fane eller vindu >>Analysis of vibratory states in phonation using spectral features of the electroglottographic signal
2014 (engelsk)Inngår i: The journal of the Acoustical Society of America, ISSN 0001-4966, Vol. 136, nr 5, s. 2773-2783Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The vocal folds can oscillate in several different ways, manifest to practitioners and clinicians as ‘registers’ or ‘mechanisms’, of which the two most commonly considered are modal voice and falsetto voice. Here these will be taken as instances of different ‘vibratory states’, i.e., distinct quasi-stationary patterns of vibration of the vocal folds. State transitions are common in biomechanical nonlinear oscillators; and they are often abrupt and impossible to predict exactly. Switching state is much like switching to a different voice. Therefore, vibratory states are a source of confounding variation, for instance, when acquiring a voice range profile (VRP). In the quest for a state-aware, non-invasive VRP, a semi-automatic method based on the short-term spectrum of the electroglottographic signal (EGG) was developed. The method identifies rapid vibratory state transitions, such as the modal-falsetto switch, and clusters the EGG data based on their similarities in the relative levels and phases of the lower frequency components. Productions of known modal and falsetto voice were accurately clustered by a Gaussian mixture model. When mapped into the VRP, this EGG-based clustering revealed connected regions of different vibratory sub-regimes in both modal and falsetto.

sted, utgiver, år, opplag, sider
Acoustical Society of America (ASA), 2014
Emneord
voice function, phonation, vocal registers, electroglottography, vocal fold vibrations
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-145677 (URN)10.1121/1.4896466 (DOI)000344989000046 ()2-s2.0-84908587626 (Scopus ID)
Prosjekter
FonaDyn
Forskningsfinansiär
Swedish Research Council, 2010-4565
Merknad

Updated from submitted to published.

QC 20140815

Tilgjengelig fra: 2014-05-26 Laget: 2014-05-26 Sist oppdatert: 2018-01-25bibliografisk kontrollert
2. Investigation of the relationship between electroglottogram waveform, fundamental frequency, and sound pressure level using clustering
Åpne denne publikasjonen i ny fane eller vindu >>Investigation of the relationship between electroglottogram waveform, fundamental frequency, and sound pressure level using clustering
2017 (engelsk)Inngår i: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 31, nr 4, s. 393-400Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Although it has been shown in previous research (Orlikoff, 1991; Henrich et al, 2005; Kuang et al, 2014; Awan, 2015) that there exists a relationship between the electroglottogram (EGG) waveform and the acoustic signal, this relationship is still not fully understood. To investigate this relationship, the EGG and acoustic signals were measured for four male amateur choir singers who each produced eight consecutive tones of increasing and decreasing vocal intensity. The EGG signals were processed cycle-synchronously to obtain the discrete Fourier transform, and the data were used as an input to a clustering algorithm. The acoustic signal was analyzed in terms of sound pressure level (dB SPL) and fundamental frequency (f(o)) of vibration, and the results of both EGG and acoustic analysis were depicted on a two-dimensional plane with f(o) on the x-axis and SPL on the y-axis. All the subjects were seen to have a weak, near-sinusoidal EGG waveform in their lowest SPL range, whereas increase in SPL coincided with progressive enrichment in harmonic content of the EGG waveforms. The results of the clustering were additionally used to classify waveforms across subjects to enable inter-subject comparisons and assessment of individual strategies of exploring the f(o)-SPL dimensions. In these male subjects, the EGG waveform shape appeared to vary with SPL and to remain essentially constant with f(o) over one octave.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2017
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-211744 (URN)10.1016/j.jvoice.2016.11.003 (DOI)000406147000001 ()27939138 (PubMedID)2-s2.0-85008154357 (Scopus ID)
Forskningsfinansiär
Swedish Research Council, 2010-4565 2013-0642
Merknad

QC 20170815

Tilgjengelig fra: 2017-08-15 Laget: 2017-08-15 Sist oppdatert: 2018-01-25bibliografisk kontrollert
3. A comparison of electroglottographic and glottal area waveforms for phonation type differentiation in male professional singers
Åpne denne publikasjonen i ny fane eller vindu >>A comparison of electroglottographic and glottal area waveforms for phonation type differentiation in male professional singers
Vise andre…
2018 (engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

This study investigates the use of glottographic signals (EGG and GAW) to study phonation in different vibratory states as produced by professionally trained singers. Six western classical tenors were asked to phonate pitch glides from modal to falsetto phonation, or modal to their stage voice above the passaggio (SVaP). For each pitch glide the sample entropy (SampEn) of the EGG signal was calculated to establish a “ground truth” for the performed phonation type; the cycles before the maximum SampEn peak were labeled as modal, and the cycles after the peak as falsetto, or SVaP. Three classifications of vibratory state were performed using clustering: one based only on the EGG, one based on the GAW, and one based on their combi- nation. The classification error rate (clustering vs ground truth) was on average smaller than 10%, for any of the three settings, revealing no special advantage of the GAW over EGG, and vice versa. The EGG-based time domain metric analysis revealed a larger contact quotient and larger normalized EGG derivative peak ratio in modal, compared to SVaP and falsetto. The glottographic waveform comparison of SVaP with falsetto and modal suggests that SVaP resembles more falsetto than modal, though with a larger contact quotient. 

Emneord
classical singing; registers; clustering; electroglottography; glottal area waveform
HSV kategori
Forskningsprogram
Tal- och musikkommunikation
Identifikatorer
urn:nbn:se:kth:diva-221795 (URN)
Forskningsfinansiär
Swedish Research Council, 2010-4565Swedish Research Council, 2013-0642
Merknad

QC 20180129

Tilgjengelig fra: 2018-01-25 Laget: 2018-01-25 Sist oppdatert: 2018-01-29bibliografisk kontrollert
4. Effect of vowel context in cepstral and entropy analysis of pathological voices
Åpne denne publikasjonen i ny fane eller vindu >>Effect of vowel context in cepstral and entropy analysis of pathological voices
Vise andre…
2019 (engelsk)Inngår i: Biomedical Signal Processing and Control, ISSN 1746-8094, E-ISSN 1746-8108, Vol. 47, s. 350-357Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

This study investigates the effect of vowel context (excerpted from speech versus sustained) on two voice quality measures: the cepstral peak prominence smoothed (CPPS) and sample entropy (SampEn). Thirty-one dysphonic subjects with different types of organic dysphonia and thirty-one controls read a phonetically balanced text and phonated sustained [a:] vowels in comfortable pitch and loudness. All the [a:] vowels of the read text were excerpted by automatic speech recognition and phonetic (forced) alignment. CPPS and SampEn were calculated for all excerpted vowels of each subject, forming one distribution of CPPS and SampEn values per subject. The sustained vowels were analyzed using a 41 ms window, forming another distribution of CPPS and SampEn values per subject. Two speech-language pathologists performed a perceptual evaluation of the dysphonic subjects’ voice quality from the recorded text. The power of discriminating the dysphonic group from the controls for SampEn and CPPS was assessed for the excerpted and sustained vowels with the Receiver-Operator Characteristic (ROC) analysis. The best discrimination in terms of Area Under Curve (AUC) for CPPS occurred using the mean of the excerpted vowel distributions (AUC=0.85) and for SampEn using the 95th percentile of the sustained vowel distributions (AUC=0.84). CPPS and SampEn were found to be negatively correlated, and the largest correlation was found between the corresponding 95th percentiles of their distributions (Pearson, r=−0.83, p < 10−3). A strong correlation was also found between the 95th percentile of SampEn distributions and the perceptual quality of breathiness (Pearson, r=0.83, p < 10−3). The results suggest that depending on the acoustic voice quality measure, sustained vowels can be more effective than excerpted vowels for detecting dysphonia. Additionally, when using CPPS or SampEn there is an advantage of using the measures’ distributions rather than their average values.

sted, utgiver, år, opplag, sider
Elsevier, 2019
Emneord
dysphonia, voice analysis, cepstral peak prominence, sample entropy, vowel context
HSV kategori
Forskningsprogram
Tal- och musikkommunikation
Identifikatorer
urn:nbn:se:kth:diva-221797 (URN)10.1016/j.bspc.2018.08.021 (DOI)000449134500035 ()2-s2.0-85053219805 (Scopus ID)
Forskningsfinansiär
Swedish Research Council, 2010-4565Swedish Research Council, 2013-0632
Merknad

QC 20180129

Tilgjengelig fra: 2018-01-25 Laget: 2018-01-25 Sist oppdatert: 2019-01-25bibliografisk kontrollert

Open Access i DiVA

Andreas Selamtzis - 2018- Analyses of voice and glottographic signals in singing and speech(6176 kB)178 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 6176 kBChecksum SHA-512
4c97ae9deb9e7e8a4cae04d5e4daed23f328fac1a1089bacea6ca5e5bc8b16868e08ff782cbbe2a2af7f9f976098f6cee73a7a701e42f388c35d09c78d835790
Type fulltextMimetype application/pdf

Søk i DiVA

Av forfatter/redaktør
Selamtzis, Andreas
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 178 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 20310 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf