Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Analysis of vibratory states in phonation using spectral features of the electroglottographic signal
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH. (Sound and Music Computing)
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH. (Sound and Music Computing)ORCID iD: 0000-0002-3362-7518
2014 (English)In: The journal of the Acoustical Society of America, ISSN 0001-4966, Vol. 136, no 5, p. 2773-2783Article in journal (Refereed) Published
Abstract [en]

The vocal folds can oscillate in several different ways, manifest to practitioners and clinicians as ‘registers’ or ‘mechanisms’, of which the two most commonly considered are modal voice and falsetto voice. Here these will be taken as instances of different ‘vibratory states’, i.e., distinct quasi-stationary patterns of vibration of the vocal folds. State transitions are common in biomechanical nonlinear oscillators; and they are often abrupt and impossible to predict exactly. Switching state is much like switching to a different voice. Therefore, vibratory states are a source of confounding variation, for instance, when acquiring a voice range profile (VRP). In the quest for a state-aware, non-invasive VRP, a semi-automatic method based on the short-term spectrum of the electroglottographic signal (EGG) was developed. The method identifies rapid vibratory state transitions, such as the modal-falsetto switch, and clusters the EGG data based on their similarities in the relative levels and phases of the lower frequency components. Productions of known modal and falsetto voice were accurately clustered by a Gaussian mixture model. When mapped into the VRP, this EGG-based clustering revealed connected regions of different vibratory sub-regimes in both modal and falsetto.

Place, publisher, year, edition, pages
Acoustical Society of America (ASA), 2014. Vol. 136, no 5, p. 2773-2783
Keywords [en]
voice function, phonation, vocal registers, electroglottography, vocal fold vibrations
National Category
Fluid Mechanics and Acoustics
Identifiers
URN: urn:nbn:se:kth:diva-145677DOI: 10.1121/1.4896466ISI: 000344989000046Scopus ID: 2-s2.0-84908587626OAI: oai:DiVA.org:kth-145677DiVA, id: diva2:719660
Projects
FonaDyn
Funder
Swedish Research Council, 2010-4565
Note

Updated from submitted to published.

QC 20140815

Available from: 2014-05-26 Created: 2014-05-26 Last updated: 2018-01-25Bibliographically approved
In thesis
1. Electroglottographic analysis of phonatory dynamics and states
Open this publication in new window or tab >>Electroglottographic analysis of phonatory dynamics and states
2014 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The human voice is a product of an intricate biophysical system. The complexity of this system enables a rich variety of possible sounds, but at the same time poses great challenges for quantitative voice analysis. For example, the vocal folds can vibrate in several different ways, leading to variations in the acoustic output. Because the vocal folds are relatively inaccessible, such variations are often difficult to account for. This work proposes a novel method for extracting non-invasively information on the vibratory state of the human vocal folds. Such information is important for creating a more complete voice analysis scheme. Invasive methods are undesirable because they often disturb the subjects and/or the studied phenomena, and they are also impractical in terms of accessibility and cost. A useful frame of reference for voice analysis is the Voice Range Profile (VRP). The 3 dimensional form of the VRP can be used to depict any phonatory metric over the 2 dimensional plane defined by the fundamental frequency of phonation (x-axis) and the sound pressure level (y-axis). The primary goal of this work was to incorporate information on the vibratory state of the vocal folds into the Voice Range Profile (e.g., as a color change). For this purpose, a novel method of analysis of the electroglottogram (EGG) was developed, using techniques from machine learning (clustering) and nonlinear time series analysis (sample entropy estimation). The analysis makes no prior assumptions on the nature of the EGG signal and does not rely on its absolute amplitude or frequency. Unlike time-domain methods, which typically define thresholds for quantifying EGG cycle metrics, the proposed method uses information from the entire cycle of each period. The analysis was applied in a variety of experimental conditions (constant vowel with different vibratory states, constant vibratory state and different vowels, constant vowel and vibratory state with varying lung volume) and the magnitude of effect on the EGG short-term spectrum was estimated for each of these conditions. It was found that the short-term spectrum of the EGG signal sufficed to discriminate between different phonatory configurations, such as modal and falsetto voice. It was found also that even supposedly purely articulatory changes could be traced in the spectrum of the EGG signal. Finally, possible pedagogical and clinical applications of the method are discussed.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2014. p. vii, 31
Series
TRITA-CSC-A, ISSN 1653-5723 ; 2014:09
Keywords
voice function, phonation, vocal fold vibration, vocal registers, electroglottography
National Category
Other Natural Sciences
Identifiers
urn:nbn:se:kth:diva-145692 (URN)987-91-7595-189-8 (ISBN)
Presentation
2014-06-13, sal Fantum, Lindstedsvägen 24, KTH, Stockholm, 15:15 (English)
Opponent
Supervisors
Projects
FonaDyn
Note

QC 20140609

Available from: 2014-06-09 Created: 2014-05-26 Last updated: 2014-06-09Bibliographically approved
2. Analyses of voice and glottographic signals in singing and speech
Open this publication in new window or tab >>Analyses of voice and glottographic signals in singing and speech
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Recent advances in machine learning and time series analysis techniques have brought new perspectives to a great number of scientific fields. This thesis contributes applications of such techniques to voice analysis, in an attempt to extract information on the vibration of the vocal folds as such, as well as on the radiated acoustic signal. The data that was analyzed in this work are acoustic recordings, electroglottographic (EGG) signals and transnasal high- speed videoendoscopic images. The data analysis techniques are primarily based on clustering, i.e., grouping of data based on similarity, and sample entropy analysis, i.e., quantifying the degree of irregularity in a given signal. The experiments were conducted so as to provide data for different types of vibratory behaviors (or vibratory states) of the vocal folds. Clustering was used in order to categorize in an unsupervised fashion these different vi- bratory states, based solely on the electroglottographic signal, or the glottal area waveform, or both. Sample entropy was utilized as an indicator of in- stabilities, when subjects produced voiced sounds using irregular vibratory patterns, such as register breaks, intermittent diplophonia, and other types of irregularities. The prominent role of sound pressure level and fundamental frequency motivated further study of the relationship between them and the shape of the electroglottographic waveform. Graphical representations were created to visualize the relationship between different vibratory behaviors with fundamental frequency and sound pressure level. The EGG waveform shape was seen to depend strongly on sound pressure level and somewhat less on fundamental frequency. In very soft phonation, the almost sinusoidal waveform of the EGG suggests that studying the EGG using clusters may give a better representation compared to conventional time-domain metrics. The paradigm of the clustering was later applied in synchronous recordings of electroglottogram and glottal area waveforms in professional tenor singers. Different vibratory states were classified successfully using clustering, and the electroglottogram was seen to be as good as the glottal area waveform for such a classification task. The last part of this work concerns voices from subjects with organic dysphonia. A study was dedicated to investigate how vowel context (sustained versus excerpted from speech) can affect the power of quantitative acoustic measures to discriminate dysphonic subjects from controls. Two acoustic voice quality measures were used: the cepstral peak prominence (smoothed) and sample entropy. The cepstral peak prominence (smoothed) showed better discriminatory power with excerpted vowels, while sample entropy with sustained vowels. Additionally, it was found that sample entropy was strongly correlated with cepstral peak prominence (smoothed) and with the perceptual quality of breathiness. 

Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2018. p. 55
Series
TRITA-EECS-AVL ; 2018:6
Keywords
voice ; singing ; electroglottography ; clustering ; dysphonia ; sample entropy ;
National Category
Other Natural Sciences
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-221825 (URN)978-91-7729-668-3 (ISBN)
Public defence
2018-02-23, F3, Lindstedtsvägen 26, Stockholm, 13:30 (English)
Opponent
Supervisors
Projects
Phonatory dynamics and states
Funder
Swedish Research Council, 2010-4565Swedish Research Council, 2013-0632
Note

QC 20180126

Available from: 2018-01-26 Created: 2018-01-25 Last updated: 2018-01-26Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopusPublisher's website

Authority records BETA

Ternström, Sten

Search in DiVA

By author/editor
Selamtzis, AndreasTernström, Sten
By organisation
Speech, Music and Hearing, TMH
Fluid Mechanics and Acoustics

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 198 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf