Change search
Refine search result
1 - 11 of 11
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Castellana, Antonella
    et al.
    Selamtzis, Andreas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Salvi, Giampiero
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Carullo, Alessio
    Astolfi, Arianna
    Cepstral and entropy analyses in vowels excerpted from continuous speech of dysphonic and control speakers2017In: Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech 2017 / [ed] ISCA, International Speech Communication Association, 2017, Vol. 2017, p. 1814-1818Conference paper (Refereed)
    Abstract [en]

    There is a growing interest in Cepstral and Entropy analyses of voice samples for defining a vocal health indicator, due to their reliability in investigating both regular and irregular voice signals. The purpose of this study is to determine whether the Cepstral Peak Prominence Smoothed (CPPS) and Sample Entropy (SampEn) could differentiate dysphonic speakers from normal speakers in vowels excerpted from readings and to compare their discrimination power. Results are reported for 33 patients and 31 controls, who read a standardized phonetically balanced passage while wearing a head mounted microphone. Vowels were excerpted from recordings using Automatic Speech Recognition and, after obtaining a measure for each vowel, individual distributions and their descriptive statistics were considered for CPPS and SampEn. The Receiver Operating Curve analysis revealed that the mean of the distributions was the parameter with the highest discrimination power for both CPPS and SampEn. CPPS showed a higher diagnostic precision than SampEn, exhibiting an Area Under Curve (AUC) of 0.85 compared to 0.72. A negative correlation between the parameters was found (Spearman; p = - 0.61), with higher SampEn corresponding to lower CPPS. The automatic method used in this study could provide support to voice monitorings in clinic and during individual's daily activities.

  • 2. Echternach, Matthias
    et al.
    Burk, Fabian
    Koeberlein, Marie
    Selamtzis, Andreas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Doellinger, Michael
    Burdumy, Michael
    Richter, Bernhard
    Herbst, Christian Thomas
    Laryngeal evidence for the first and second passaggio in professionally trained sopranos2017In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 12, no 5, article id e0175865Article in journal (Refereed)
    Abstract [en]

    Introduction Due to a lack of empirical data, the current understanding of the laryngeal mechanics in the passaggio regions (i.e., the fundamental frequency ranges where vocal registration events usually occur) of the female singing voice is still limited. Material and methods In this study the first and second passaggio regions of 10 professionally trained female classical soprano singers were analyzed. The sopranos performed pitch glides from A3 (f(o) = 220 Hz) to A4 (f(o) = 440 Hz) and from A4 (f(o) = 440 Hz) to A5 (f(o) = 880 Hz) on the vowel [i:]. Vocal fold vibration was assessed with trans-nasal high speed videoendoscopy at 20,000 fps, complemented by simultaneous electroglottographic (EGG) and acoustic recordings. Register breaks were perceptually rated by 12 voice experts. Voice stability was documented with the EGG-based sample entropy. Glottal opening and closing patterns during the passaggi were analyzed, supplemented with open quotient data extracted from the glottal area waveform. Results In both the first and the second passaggio, variations of vocal fold vibration patterns were found. Four distinct patterns emerged: smooth transitions with either increasing or decreasing durations of glottal closure, abrupt register transitions, and intermediate loss of vocal fold contact. Audible register transitions (in both the first and second passaggi) generally coincided with higher sample entropy values and higher open quotient variance through the respective passaggi. Conclusions Noteworthy vocal fold oscillatory registration events occur in both the first and the second passaggio even in professional sopranos. The respective transitions are hypothesized to be caused by either (a) a change of laryngeal biomechanical properties; or by (b) vocal tract resonance effects, constituting level 2 source-filter interactions.

  • 3.
    Selamtzis, Andreas
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Analyses of voice and glottographic signals in singing and speech2018Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Recent advances in machine learning and time series analysis techniques have brought new perspectives to a great number of scientific fields. This thesis contributes applications of such techniques to voice analysis, in an attempt to extract information on the vibration of the vocal folds as such, as well as on the radiated acoustic signal. The data that was analyzed in this work are acoustic recordings, electroglottographic (EGG) signals and transnasal high- speed videoendoscopic images. The data analysis techniques are primarily based on clustering, i.e., grouping of data based on similarity, and sample entropy analysis, i.e., quantifying the degree of irregularity in a given signal. The experiments were conducted so as to provide data for different types of vibratory behaviors (or vibratory states) of the vocal folds. Clustering was used in order to categorize in an unsupervised fashion these different vi- bratory states, based solely on the electroglottographic signal, or the glottal area waveform, or both. Sample entropy was utilized as an indicator of in- stabilities, when subjects produced voiced sounds using irregular vibratory patterns, such as register breaks, intermittent diplophonia, and other types of irregularities. The prominent role of sound pressure level and fundamental frequency motivated further study of the relationship between them and the shape of the electroglottographic waveform. Graphical representations were created to visualize the relationship between different vibratory behaviors with fundamental frequency and sound pressure level. The EGG waveform shape was seen to depend strongly on sound pressure level and somewhat less on fundamental frequency. In very soft phonation, the almost sinusoidal waveform of the EGG suggests that studying the EGG using clusters may give a better representation compared to conventional time-domain metrics. The paradigm of the clustering was later applied in synchronous recordings of electroglottogram and glottal area waveforms in professional tenor singers. Different vibratory states were classified successfully using clustering, and the electroglottogram was seen to be as good as the glottal area waveform for such a classification task. The last part of this work concerns voices from subjects with organic dysphonia. A study was dedicated to investigate how vowel context (sustained versus excerpted from speech) can affect the power of quantitative acoustic measures to discriminate dysphonic subjects from controls. Two acoustic voice quality measures were used: the cepstral peak prominence (smoothed) and sample entropy. The cepstral peak prominence (smoothed) showed better discriminatory power with excerpted vowels, while sample entropy with sustained vowels. Additionally, it was found that sample entropy was strongly correlated with cepstral peak prominence (smoothed) and with the perceptual quality of breathiness. 

  • 4.
    Selamtzis, Andreas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Electroglottographic analysis of phonatory dynamics and states2014Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    The human voice is a product of an intricate biophysical system. The complexity of this system enables a rich variety of possible sounds, but at the same time poses great challenges for quantitative voice analysis. For example, the vocal folds can vibrate in several different ways, leading to variations in the acoustic output. Because the vocal folds are relatively inaccessible, such variations are often difficult to account for. This work proposes a novel method for extracting non-invasively information on the vibratory state of the human vocal folds. Such information is important for creating a more complete voice analysis scheme. Invasive methods are undesirable because they often disturb the subjects and/or the studied phenomena, and they are also impractical in terms of accessibility and cost. A useful frame of reference for voice analysis is the Voice Range Profile (VRP). The 3 dimensional form of the VRP can be used to depict any phonatory metric over the 2 dimensional plane defined by the fundamental frequency of phonation (x-axis) and the sound pressure level (y-axis). The primary goal of this work was to incorporate information on the vibratory state of the vocal folds into the Voice Range Profile (e.g., as a color change). For this purpose, a novel method of analysis of the electroglottogram (EGG) was developed, using techniques from machine learning (clustering) and nonlinear time series analysis (sample entropy estimation). The analysis makes no prior assumptions on the nature of the EGG signal and does not rely on its absolute amplitude or frequency. Unlike time-domain methods, which typically define thresholds for quantifying EGG cycle metrics, the proposed method uses information from the entire cycle of each period. The analysis was applied in a variety of experimental conditions (constant vowel with different vibratory states, constant vibratory state and different vowels, constant vowel and vibratory state with varying lung volume) and the magnitude of effect on the EGG short-term spectrum was estimated for each of these conditions. It was found that the short-term spectrum of the EGG signal sufficed to discriminate between different phonatory configurations, such as modal and falsetto voice. It was found also that even supposedly purely articulatory changes could be traced in the spectrum of the EGG signal. Finally, possible pedagogical and clinical applications of the method are discussed.

  • 5.
    Selamtzis, Andreas
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Castellana, Antonella
    Department of Electronics and Telecommunications, Politecnico di Torino, Italy.
    Salvi, Giampiero
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Carullo, Alessio
    Department of Electronics and Telecommunications, Politecnico di Torino, Italy.
    Astolfi, Arianna
    Department of Electronics and Telecommunications, Politecnico di Torino, Italy.
    Effect of vowel context in cepstral and entropy analysis of pathological voices2019In: Biomedical Signal Processing and Control, ISSN 1746-8094, E-ISSN 1746-8108, Vol. 47, p. 350-357Article in journal (Refereed)
    Abstract [en]

    This study investigates the effect of vowel context (excerpted from speech versus sustained) on two voice quality measures: the cepstral peak prominence smoothed (CPPS) and sample entropy (SampEn). Thirty-one dysphonic subjects with different types of organic dysphonia and thirty-one controls read a phonetically balanced text and phonated sustained [a:] vowels in comfortable pitch and loudness. All the [a:] vowels of the read text were excerpted by automatic speech recognition and phonetic (forced) alignment. CPPS and SampEn were calculated for all excerpted vowels of each subject, forming one distribution of CPPS and SampEn values per subject. The sustained vowels were analyzed using a 41 ms window, forming another distribution of CPPS and SampEn values per subject. Two speech-language pathologists performed a perceptual evaluation of the dysphonic subjects’ voice quality from the recorded text. The power of discriminating the dysphonic group from the controls for SampEn and CPPS was assessed for the excerpted and sustained vowels with the Receiver-Operator Characteristic (ROC) analysis. The best discrimination in terms of Area Under Curve (AUC) for CPPS occurred using the mean of the excerpted vowel distributions (AUC=0.85) and for SampEn using the 95th percentile of the sustained vowel distributions (AUC=0.84). CPPS and SampEn were found to be negatively correlated, and the largest correlation was found between the corresponding 95th percentiles of their distributions (Pearson, r=−0.83, p < 10−3). A strong correlation was also found between the 95th percentile of SampEn distributions and the perceptual quality of breathiness (Pearson, r=0.83, p < 10−3). The results suggest that depending on the acoustic voice quality measure, sustained vowels can be more effective than excerpted vowels for detecting dysphonia. Additionally, when using CPPS or SampEn there is an advantage of using the measures’ distributions rather than their average values.

  • 6.
    Selamtzis, Andreas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Ternström, Sten
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Analysis of vibratory states in phonation using spectral features of the electroglottographic signal2014In: The journal of the Acoustical Society of America, ISSN 0001-4966, Vol. 136, no 5, p. 2773-2783Article in journal (Refereed)
    Abstract [en]

    The vocal folds can oscillate in several different ways, manifest to practitioners and clinicians as ‘registers’ or ‘mechanisms’, of which the two most commonly considered are modal voice and falsetto voice. Here these will be taken as instances of different ‘vibratory states’, i.e., distinct quasi-stationary patterns of vibration of the vocal folds. State transitions are common in biomechanical nonlinear oscillators; and they are often abrupt and impossible to predict exactly. Switching state is much like switching to a different voice. Therefore, vibratory states are a source of confounding variation, for instance, when acquiring a voice range profile (VRP). In the quest for a state-aware, non-invasive VRP, a semi-automatic method based on the short-term spectrum of the electroglottographic signal (EGG) was developed. The method identifies rapid vibratory state transitions, such as the modal-falsetto switch, and clusters the EGG data based on their similarities in the relative levels and phases of the lower frequency components. Productions of known modal and falsetto voice were accurately clustered by a Gaussian mixture model. When mapped into the VRP, this EGG-based clustering revealed connected regions of different vibratory sub-regimes in both modal and falsetto.

  • 7.
    Selamtzis, Andreas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Ternström, Sten
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Investigation of the relationship between electroglottogram waveform, fundamental frequency, and sound pressure level using clustering2017In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 31, no 4, p. 393-400Article in journal (Refereed)
    Abstract [en]

    Although it has been shown in previous research (Orlikoff, 1991; Henrich et al, 2005; Kuang et al, 2014; Awan, 2015) that there exists a relationship between the electroglottogram (EGG) waveform and the acoustic signal, this relationship is still not fully understood. To investigate this relationship, the EGG and acoustic signals were measured for four male amateur choir singers who each produced eight consecutive tones of increasing and decreasing vocal intensity. The EGG signals were processed cycle-synchronously to obtain the discrete Fourier transform, and the data were used as an input to a clustering algorithm. The acoustic signal was analyzed in terms of sound pressure level (dB SPL) and fundamental frequency (f(o)) of vibration, and the results of both EGG and acoustic analysis were depicted on a two-dimensional plane with f(o) on the x-axis and SPL on the y-axis. All the subjects were seen to have a weak, near-sinusoidal EGG waveform in their lowest SPL range, whereas increase in SPL coincided with progressive enrichment in harmonic content of the EGG waveforms. The results of the clustering were additionally used to classify waveforms across subjects to enable inter-subject comparisons and assessment of individual strategies of exploring the f(o)-SPL dimensions. In these male subjects, the EGG waveform shape appeared to vary with SPL and to remain essentially constant with f(o) over one octave.

  • 8.
    Selamtzis, Andreas
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Ternström, Sten
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Richter, Bernard
    Burk, Fabian
    Köberlein, Maria
    Echternach, Matthias
    A comparison of electroglottographic and glottal area waveforms for phonation type differentiation in male professional singers2018Manuscript (preprint) (Other academic)
    Abstract [en]

    This study investigates the use of glottographic signals (EGG and GAW) to study phonation in different vibratory states as produced by professionally trained singers. Six western classical tenors were asked to phonate pitch glides from modal to falsetto phonation, or modal to their stage voice above the passaggio (SVaP). For each pitch glide the sample entropy (SampEn) of the EGG signal was calculated to establish a “ground truth” for the performed phonation type; the cycles before the maximum SampEn peak were labeled as modal, and the cycles after the peak as falsetto, or SVaP. Three classifications of vibratory state were performed using clustering: one based only on the EGG, one based on the GAW, and one based on their combi- nation. The classification error rate (clustering vs ground truth) was on average smaller than 10%, for any of the three settings, revealing no special advantage of the GAW over EGG, and vice versa. The EGG-based time domain metric analysis revealed a larger contact quotient and larger normalized EGG derivative peak ratio in modal, compared to SVaP and falsetto. The glottographic waveform comparison of SVaP with falsetto and modal suggests that SVaP resembles more falsetto than modal, though with a larger contact quotient. 

  • 9.
    Selamtzis, Andreas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Ternström, Sten
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Richter, Bernard
    Burk, Fabian
    Köberlein, Marie
    Echternach, Matthias
    A comparison of electroglottographic and glottal area waveforms for phonation type differentiation in male professional singers2018In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, ISSN 0001-4966, Vol. 144, no 6, p. 3275-3288Article in journal (Refereed)
    Abstract [en]

    This study compares the use of electroglottograms (EGGs) and glottal area waveforms (GAWs) to study phonation in different vibratory states as produced by professionally trained singers. Six western classical tenors were asked to phonate pitch glides from modal to falsetto phonation, or from modal to their stage voice above the passaggio (SVaP). For each pitch glide the sample entropy (SampEn) of the EGG signal was calculated to detect the occurrence of phonatory instabilities and establish a ᅵground truthᅵ for the performed phonation type. The cycles before the maximum SampEn were labeled as modal, and the cycles after the peak were labeled as either falsetto, or SVaP. Three automatic categorizations of vibratory state were performed using clustering: one based only on the EGG, one based on the GAW, and one based on their combination. The error rate (clustering vs ground truth) was, on average, lower than 10% for all of the three settings, revealing no special advantage of the GAW over EGG, and vice vers...

  • 10.
    Ternström, Sten
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    D'Amario, Sara
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH. University of York.
    Selamtzis, Andreas
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Effects of the lung volume on the electroglottographic waveform in trained female singers2018In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588Article in journal (Refereed)
    Abstract [en]

    Objectives: To determine if in singing there is an effect of lung volume on the electroglottographic waveform, and if so, how it varies over the voice range. Study design: Eight trained female singers sang the tune “Frère Jacques” in 18 conditions: three phonetic contexts, three dynamic levels, and high or low lung volume. Conditions were randomized and replicated. Methods: The audio and EGG signals were recorded in synchrony with signals tracking respiration and vertical larynx position. The first 10 Fourier descriptors of every EGG cycle were computed. These spectral data were clustered statistically, and the clusters were mapped by color into a voice range profile display, thus visualizing the EGG waveform changes under the influence of fo and SPL. The rank correlations and effect sizes of the relationships between relative lung volume and several adduction-related EGG wave shape metrics were similarly rendered on a color scale, in voice range profile-style ʻvoice maps.ʼ Results: In most subjects, EGG waveforms varied considerably over the voice range. Within subjects, reproducibility was high, not only across the replications, but also across the phonetic contexts. The EGG waveforms were quite individual, as was the nature of the EGG shape variation across the range. EGG metrics were significantly correlated to changes in lung volume, in parts of the range of the song, and in most subjects. However, the effect sizes of the relative lung volume were generally much smaller than the effects of fo and SPL, and the relationships always varied, even changing polarity from one part of the range to another. Conclusions: Most subjects exhibited small, reproducible effects of the relative lung volume on the EGG waveform. Some hypothesized influences of tracheal pull were seen, mostly at the lowest SPLs. The effects were however highly variable, both across the moderately wide fo-SPL range and across subjects. Different singers may be applying different techniques and compensatory behaviors with changing lung volume. The outcomes emphasize the importance of making observations over a substantial part of the voice range, and not only of phonations sustained at a few fundamental frequencies and sound levels.

  • 11.
    Ternström, Sten
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Johansson, Dennis
    Selamtzis, Andreas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    FonaDyn - A system for real-time analysis of the electroglottogram, over the voice range2018In: Software Quality Professional, ISSN 1522-0540, SoftwareX, ISSN 2352-7110, Vol. 7, p. 74-80Article in journal (Refereed)
    Abstract [en]

    From soft to loud and low to high, the mechanisms of human voice have many degrees of freedom, making it difficult to assess phonation from the acoustic signal alone. FonaDyn is a research tool that combines acoustics with electroglottography (EGG). It characterizes and visualizes in real time the dynamics of EGG waveforms, using statistical clustering of the cycle-synchronous EGG Fourier components, and their sample entropy. The prevalence and stability of different EGG waveshapes are mapped as colored regions into a so-called voice range profile, without needing pre-defined thresholds or categories. With appropriately ‘trained’ clusters, FonaDyn can classify and map voice regimes. This is of potential scientific, clinical and pedagogical interest.

1 - 11 of 11
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf