kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 134) Show all publications
Włodarczak, M., Ludusan, B., Sundberg, J. & Heldner, M. (2025). Classification of voice quality using neck-surface acceleration: Comparison with glottal flow and radiated sound. Journal of Voice, 39(1), 10-24
Open this publication in new window or tab >>Classification of voice quality using neck-surface acceleration: Comparison with glottal flow and radiated sound
2025 (English)In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 39, no 1, p. 10-24Article in journal (Refereed) Published
Abstract [en]

Objectives: The aim of the present study is to investigate the usefulness of features extracted from miniature accelerometers attached to speaker's tracheal wall below the glottis for classification of phonation type. The performance of the accelerometer features is evaluated relative to features obtained from inverse filtered and radiated sound. While the former is a good proxy for the voice source, obtaining robust voice source features from the latter is considered difficult since it also contains information about the vocal tract filter. By contrast, the accelerometer signal is largely unaffected by the vocal tract and although it is shaped by subglottal resonances and the transfer properties of the neck tissue, these properties remain constant within a speaker. For this reason, we expect it to provide a better approximation of the voice source than the raw audio. We also investigate which aspects of the voice source are derivable from the accelerometer and microphone signals. Methods: Five trained singers (two females and three males) were recorded producing the syllable [pæ:] in three voice qualities (neutral, breathy and pressed) and at three pitch levels as determined by the participants’ personal preference. Features extracted from the three signals were used for classification of phonation type using a random forest classifier. In addition, accelerometer and microphone features with highest correlation with the voice source features were identified. Results: The three signals showed comparable classification error rates, with considerable differences across speakers both with respect to the overall performance and the importance of individual features. The speaker-specific differences notwithstanding, variation of phonation type had consistent effects on the voice source, accelerometer and audio signals. With regard to the voice source, AQ, NAQ, L1L2 and CQ all showed a monotonic variation along the breathy – neutral – pressed continuum. Several features were also found to vary systematically in the accelerometer and audio signals: HRF, L1L2 and CPPS (both the accelerometer and the audio), as well as the sound level (for the audio). The random forest analysis revealed that all of these features were also among the most important for the classification of voice quality. Conclusion: Both the accelerometer and the audio signals were found to discriminate between phonation types with an accuracy approaching that of the voice source. Thus, the accelerometer signal, which is largely uncontaminated by vocal tract resonances, offered no advantage over the signal collected with a normal microphone.

Place, publisher, year, edition, pages
Elsevier BV, 2025
Keywords
accelerometer, audio, phonation type classification, voice source
National Category
Signal Processing Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-335782 (URN)10.1016/j.jvoice.2022.06.034 (DOI)001414592600001 ()36028369 (PubMedID)2-s2.0-85136510333 (Scopus ID)
Note

QC 20250226

Available from: 2023-09-07 Created: 2023-09-07 Last updated: 2025-02-26Bibliographically approved
Baker, C. P., Sundberg, J., Purdy, S. C., Rakena, T. O. & Leão, S. H. (2024). CPPS and Voice-Source Parameters: Objective Analysis of the Singing Voice. Journal of Voice, 38(3), 549-560
Open this publication in new window or tab >>CPPS and Voice-Source Parameters: Objective Analysis of the Singing Voice
Show others...
2024 (English)In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 38, no 3, p. 549-560Article in journal (Refereed) Published
Abstract [en]

Introduction: In recent years cepstral analysis and specific cepstrum-based measures such as smoothed cepstral peak prominence (CPPS) has become increasingly researched and utilized in attempts to determine the extent of overall dysphonia in voice signals. Yet, few studies have extensively examined how specific voice-source parameters affect CPPS values. Objective: Using a range of synthesized tones, this exploratory study sought to systematically analyze the effect of fundamental frequency (fo), vibrato extent, source-spectrum tilt, and the amplitude of the voice-source fundamental on CPPS values. Materials and Methods: A series of scales were synthesised using the freeware Madde. Fundamental frequency, vibrato extent, source-spectrum tilt, and the amplitude of the voice-source fundamental were systematically and independently varied. The tones were analysed in PRAAT, and statistical analyses were conducted in SPSS. Results: CPPS was significantly affected by both fo and source-spectrum tilt, independently. A nonlinear association was seen between vibrato extent and CPPS, where CPPS values increased from 0 to 0.6 semitones (ST), then rapidly decreased approaching 1.0 ST. No relationship was seen between the amplitude of the voice-source fundamental and CPPS. Conclusion: The large effect of fo should be taken into account when analyzing the voice, particularly in singing-voice research, when comparing pre and posttreatment data, and when comparing inter-subject CPPS data. 

Place, publisher, year, edition, pages
Elsevier BV, 2024
Keywords
Cepstral analysis, CPPS, Singing, Voice, Voice analysis
National Category
Gynaecology, Obstetrics and Reproductive Medicine
Identifiers
urn:nbn:se:kth:diva-318405 (URN)10.1016/j.jvoice.2021.12.010 (DOI)001235166500001 ()35000836 (PubMedID)2-s2.0-85122449473 (Scopus ID)
Note

QC 20240619

Available from: 2022-09-21 Created: 2022-09-21 Last updated: 2025-02-11Bibliographically approved
Sundberg, J., Salomão, G. L. & Scherer, K. R. (2024). Emotional expressivity in singing: Assessing physiological and acoustic indicators of two opera singers' voice characteristics. Journal of the Acoustical Society of America, 155(1), 18-28
Open this publication in new window or tab >>Emotional expressivity in singing: Assessing physiological and acoustic indicators of two opera singers' voice characteristics
2024 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 155, no 1, p. 18-28Article in journal (Refereed) Published
Abstract [en]

In an earlier study, we analyzed how audio signals obtained from three professional opera singers varied when they sang one octave wide eight-tone scales in ten different emotional colors. The results showed systematic variations in voice source and long-term-average spectrum (LTAS) parameters associated with major emotion “families”. For two of the singers, subglottal pressure (PSub) also was recorded, thus allowing analysis of an additional main physiological voice control parameter, glottal resistance (defined as the ratio between PSub and glottal flow), and related to glottal adduction. In the present study, we analyze voice source and LTAS parameters derived from the audio signal and their correlation with Psub and glottal resistance. The measured parameters showed a systematic relationship with the four emotion families observed in our previous study. They also varied systematically with values of the ten emotions along the valence, power, and arousal dimensions; valence showed a significant correlation with the ratio between acoustic voice source energy and subglottal pressure, while Power varied significantly with sound level and two measures related to the spectral dominance of the lowest spectrum partial. the fundamental.

Place, publisher, year, edition, pages
Acoustical Society of America, 2024
National Category
Natural Language Processing Music Fluid Mechanics
Identifiers
urn:nbn:se:kth:diva-342389 (URN)10.1121/10.0023938 (DOI)001135659200002 ()38169520 (PubMedID)2-s2.0-85181588072 (Scopus ID)
Note

QC 20240118

Available from: 2024-01-17 Created: 2024-01-17 Last updated: 2025-02-21Bibliographically approved
Rosenberg, S., Sundberg, J. & Lã, F. (2024). Kulning: Acoustic and Perceptual Characteristics of a Calling Style Used Within the Scandinavian Herding Tradition. Journal of Voice, 38(3), 585-594
Open this publication in new window or tab >>Kulning: Acoustic and Perceptual Characteristics of a Calling Style Used Within the Scandinavian Herding Tradition
2024 (English)In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 38, no 3, p. 585-594Article in journal (Refereed) Published
Abstract [en]

Kulning, a loud, high-pitched vocal calling technique pertaining to the Scandinavian herding system, has attracted several researchers' attention, mainly focusing on cultural, phonatory and musical aspects. Less attention has been paid to the spectral and physiological properties that characterize Kulning tones, and also if there is a physiologically optimum pitch range. We analyzed tones produced by ten participants with varying experience in Kulning. They performed a phrase, pitch range G5 to C6 (784 to 1046 Hz), in three different conditions: starting (1) on pitch A5, (2) on the participant's preferred pitch, and (3) after the deepest possible inhalation, also on the participant's preferred pitch subglottal pressure (Psub) was measured as the oral pressure during /p/-occlusion. The quality of the Kulning was rated by a group of experts. The highest-rated tones all had a sound pressure level (SPL) at 0.3 m exceeding 115 dB and a pitch higher than 1010 Hz, while the SPL of the lowest rated tones was less than 108 dB at a pitch below 900 Hz. A multiple regression analysis was performed to evaluate the relationship between the ratings and Psub), SPL, level of the fundamental and the frequency at which a spectrum envelope dip occurred. Highly rated tones were started at maximum lung volumes, and on participants’ preferred pitches. They all shared a high frequency of the spectrum envelope dip and a high level of the fundamental. In decreasing order of ratings, Condition 3 showed the highest values followed by Condition 2 and Condition 1. Each singer seemed to perform best within an individual Psub and pitch range. The relevance of the results to voice pedagogy, artistic, and compositional work is discussed.

Place, publisher, year, edition, pages
Elsevier BV, 2024
Keywords
Kulning, Sound pressure level, Spectrum characteristics, Subglottal pressure, Tone quality
National Category
Music General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-319614 (URN)10.1016/j.jvoice.2021.11.016 (DOI)001236655100001 ()34991935 (PubMedID)2-s2.0-85123279942 (Scopus ID)
Note

QC 20240619

Available from: 2022-10-05 Created: 2022-10-05 Last updated: 2025-02-21Bibliographically approved
Havel, M., Sundberg, J., Traser, L., Burdumy, M. & Echternach, M. (2023). Effects of Nasalization on Vocal Tract Response Curve. Journal of Voice, 37(3), 339-347
Open this publication in new window or tab >>Effects of Nasalization on Vocal Tract Response Curve
Show others...
2023 (English)In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 37, no 3, p. 339-347Article in journal (Refereed) Published
Abstract [en]

Background: Earlier studies have shown that nasalization affects the radiated spectrum by modifying the vocal tract transfer function in a complex manner. Methods: Here we study this phenomenon by measuring sine-sweep response of 3-D models of the vowels /u, a, ᴂ, i/, derived from volumetric MR imaging, coupled by means of tubes of different lengths and diameters to a 3-D model of a nasal tract. Results: The coupling introduced a dip into the vocal tract transfer function. The dip frequency was close to the main resonance of the nasal tract, a result in agreement with the Fujimura & Lindqvist in vivo sweep tone measurements [Fujimura & Lindqvist, 1972]. With increasing size of the coupling tube the depth of the dip increased and the first formant peak either changed in frequency or was split by the dip. Only marginal effects were observed of the paranasal sinuses. For certain coupling tube sizes, the spectrum balance was changed, boosting the formant peaks in the 2 – 4 kHz range. Conclusion: A velopharyngeal opening introduces a dip in the transfer function at the main resonance of the nasal tract. Its depth increases with the area of the opening and its frequency rises in some vowels.

Place, publisher, year, edition, pages
Elsevier BV, 2023
Keywords
Vocal tract, Nasal tract, Velopharyngeal opening, Transfer function, Sine sweep excitation, Spectrum balance, article, controlled study, excitation, in vivo study, nuclear magnetic resonance imaging, paranasal sinus, vowel
National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-307431 (URN)10.1016/j.jvoice.2021.02.013 (DOI)000990753200001 ()33773895 (PubMedID)2-s2.0-85103075122 (Scopus ID)
Note

QC 20250402

Available from: 2022-01-25 Created: 2022-01-25 Last updated: 2025-04-02Bibliographically approved
Sundberg, J., La, F. & Granqvist, S. (2023). Fundamental frequency disturbances in female and male singers' pitch glides through long tube with varied resistancesa. Journal of the Acoustical Society of America, 154(2), 801-807
Open this publication in new window or tab >>Fundamental frequency disturbances in female and male singers' pitch glides through long tube with varied resistancesa
2023 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 154, no 2, p. 801-807Article in journal (Refereed) Published
Abstract [en]

Source-filter interaction can disturb vocal fold vibration frequency. Resonance frequency/bandwidth ratios (Q-values) may affect such interaction. Occurrences of fundamental frequency (f(o)) disturbances were measured in ascending pitch glides produced by four female and five male singers phonating into a 70 cm long tube. Pitch glides were produced with varied resonance Q-values of the vocal tract + tube compound (VT + tube): (i) tube end open, (ii) tube end open with nasalization, and (iii) with a piece of cotton wool in the tube end (conditions Op, Ns, and Ct, respectively). Disturbances of f(o) were identified by calculating the derivative of the low-pass filtered f(o) curve. Resonance frequencies of the compound VT+tube system were determined from ringings and glottal aspiration noise observed in narrowband spectrograms. Disturbances of f(o) tended to occur when a partial was close to a resonance of the compound VT+tube system. The number of such disturbances was significantly lower when the resonance Q-values were reduced (conditions Ns and Ct), particularly for the males. In some participants, resonance Q-values seemed less influential, suggesting little effect of source-filter interaction. The study sheds light on factors affecting source-filter interaction and f(o) control and is, therefore, relevant to voice pedagogy and theory of voice production.

Place, publisher, year, edition, pages
Acoustical Society of America (ASA), 2023
National Category
Music
Identifiers
urn:nbn:se:kth:diva-334706 (URN)10.1121/10.0020569 (DOI)001045013800006 ()37556565 (PubMedID)2-s2.0-85167533243 (Scopus ID)
Note

QC 20230824

Available from: 2023-08-24 Created: 2023-08-24 Last updated: 2025-02-21Bibliographically approved
Ekström, A. G., Moran, S., Sundberg, J. & Lameira, A. R. (2023). PREQUEL: SUPERVISED PHONETIC APPROACHES TO ANALYSES OF GREAT APE QUASI-VOWELS. In: ICPhS 2023: . Paper presented at ICPhS 2023,August 7-11,Prague, Czech Republic.
Open this publication in new window or tab >>PREQUEL: SUPERVISED PHONETIC APPROACHES TO ANALYSES OF GREAT APE QUASI-VOWELS
2023 (English)In: ICPhS 2023, 2023Conference paper, Published paper (Refereed)
Abstract [en]

 There is renewed interest in potential vowel production by nonhuman primates, but no agreedupon methodologies for its estimation from reallife vocalizations. Here, we present a set of supervised approaches for estimating primate vowel-like articulation, with reference to orangutan long call pulses (N=36). We summarize our approach as a cohesive framework, the Primate Quasi-Vowel (PREQUEL) protocol. We (1) estimated f0 from correlograms, (2) and vocal tract resonances (formants) from spectrograms, (3) the results of which were then compared against synthesized vowels for those frequency values; and (4) presented to uninformed listeners (N=16), who largely agreed on the categorization of vowel-like qualities for vocalizations (Cronbach’s alpha=.701). We also provide descriptions of methods that are seemingly inadequate for formant estimation in great ape calls. We argue that a combination of phonetic methods is required to develop a science of nonhuman primate articulation.

National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-351247 (URN)
Conference
ICPhS 2023,August 7-11,Prague, Czech Republic
Note

QC 20240805

Available from: 2024-08-04 Created: 2024-08-04 Last updated: 2024-08-05Bibliographically approved
Sundberg, J., Lindblom, B. & Hefele, A.-M. -. (2023). Voice source, formant frequencies and vocal tract shape in overtone singing. A case study. Logopedics, Phoniatrics, Vocology, 48(2), 75-87
Open this publication in new window or tab >>Voice source, formant frequencies and vocal tract shape in overtone singing. A case study
2023 (English)In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 48, no 2, p. 75-87Article in journal (Refereed) Published
Abstract [en]

Purpose: In overtone singing a singer produces two pitches simultaneously, a low-pitched, continuous drone plus a melody played on the higher, flutelike and strongly enhanced overtones of the drone. The purpose of this study was to analyse underlying acoustical, phonatory and articulatory phenomena. Methods: The voice source was analyzed by inverse filtering the sound, the articulation from a dynamic MRI video of the vocal tract profile, and the lip opening from a frontal-view video recording. Vocal tract cross-distances were measured in the MR recording and converted to area functions, the formant frequencies of which computed. Results: Inverse filtering revealed that the overtone enhancement resulted from a close clustering of formants 2 and 3. The MRI material showed that for low enhanced overtone frequencies (F E) the tongue tip was raised and strongly retracted, while for high F E the tongue tip was less retracted but forming a longer constriction. Thus, the tongue configuration changed from an apical/anterior to a dorsal/posterior articulation. The formant frequencies derived from the area functions matched almost perfectly those used for the inverse filtering. Further, analyses of the area functions revealed that the second formant frequency was strongly dependent on the back cavity, and the third on the front cavity, which acted like a Helmholtz resonator, tuned by the tongue tip position and lip opening. Conclusions: This type of overtone singing can be fully explained by the well-established source-filter theory of voice production, as recently found by Bergevin et al. [1] for another type of overtone singing. 

Place, publisher, year, edition, pages
Informa UK Limited, 2023
Keywords
area function, Formant clustering, front cavity, Helmholtz resonator, inverse filtering, tongue shape, tongue tip
National Category
Otorhinolaryngology Musicology
Identifiers
urn:nbn:se:kth:diva-313862 (URN)10.1080/14015439.2021.1998607 (DOI)000725990300001 ()34860148 (PubMedID)2-s2.0-85121026665 (Scopus ID)
Note

QC 20250321

Available from: 2022-06-13 Created: 2022-06-13 Last updated: 2025-03-21Bibliographically approved
Lã, F., Sundberg, J. & Granqvist, S. (2022). Augmented visual-feedback of airflow: Immediate effects on voice-source characteristics of students of singing. Psychology of Music, 50(3), 933-944
Open this publication in new window or tab >>Augmented visual-feedback of airflow: Immediate effects on voice-source characteristics of students of singing
2022 (English)In: Psychology of Music, ISSN 0305-7356, E-ISSN 1741-3087, Vol. 50, no 3, p. 933-944Article in journal (Refereed) Published
Abstract [en]

Glottal adduction is a crucial aspect in voice education and vocal performance: it has major effects on phonatory airflow and, consequently, on voice timbre. As the voice is a non-visible musical instrument, controlling it could be facilitated by providing real-time visual feedback of phonatory airflow. Here, we test the usefulness of a flow ball (FB) training device, visualizing, in terms of the height of a polystyrene ball placed in a plastic basket, phonatory airflow during phonation. Audio and electroglottographic recordings of five postgraduate, classically trained singer students were made under three subsequent conditions: before, during, and after phonating into the FB. The calibrated audio signal was inverse-filtered, using an electroglottograph signal to guide the manual tuning of the inverse filters. Mean phonatory airflow, peak-to-peak pulse amplitude, and normalized amplitude quotient were extracted from the resulting flow glottograms. After the FB condition, increases of mean flow and peak-to-peak pulse amplitude were observed in four singers. In addition, the singers’ mean normalized amplitude quotient increased significantly. The findings, although exploratory, suggest that reduction of glottal adduction can be observed immediately after FB phonation. 

Place, publisher, year, edition, pages
SAGE Publications, 2022
Keywords
classical singing, flow phonation, glottal adduction, phonatory airflow, real-time visual feedback, voice training
National Category
Otorhinolaryngology Music
Identifiers
urn:nbn:se:kth:diva-310388 (URN)10.1177/03057356211026735 (DOI)000673661100001 ()2-s2.0-85110038944 (Scopus ID)
Note

QC 20250507

Available from: 2022-04-04 Created: 2022-04-04 Last updated: 2025-05-07Bibliographically approved
Baker, C. P., Sundberg, J., Purdy, S. C. & Rakena, T. O. (2022). Female adolescent singing voice characteristics: an exploratory study using LTAS and inverse filtering. Logopedics, Phoniatrics, Vocology, 1-13
Open this publication in new window or tab >>Female adolescent singing voice characteristics: an exploratory study using LTAS and inverse filtering
2022 (English)In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, p. 1-13Article in journal (Refereed) Published
Abstract [en]

Background and Aim: To date, little research is available that objectively quantifies female adolescent singing-voice characteristics in light of the physiological and functional developments that occur from puberty to adulthood. This exploratory study sought to augment the pool of data available that offers objective voice analysis of female singers in late adolescence. Methods: Using long-term average spectra (LTAS) and inverse filtering techniques, dynamic range and voice-source characteristics were determined in a cohort of vocally healthy cis-gender female adolescent singers (17 to 19 years) from high-school choirs in Aotearoa New Zealand. Non-parametric statistics were used to determine associations and significant differences. Results: Wide intersubject variation was seen between dynamic range, spectral measures of harmonic organisation (formant cluster prominence, FCP), noise components in the spectrum (high-frequency energy ratio, HFER), and the normalised amplitude quotient (NAQ) suggesting great variability in ability to control phonatory mechanisms such as subglottal pressure (Psub), glottal configuration and adduction, and vocal tract shaping. A strong association between the HFER and NAQ suggest that these non-invasive measures may offer complimentary insights into vocal function, specifically with regard to glottal adduction and turbulent noise in the voice signal. Conclusion: Knowledge of the range of variation within healthy adolescent singers is necessary for the development of effective and inclusive pedagogical practices, and for vocal-health professionals working with singers of this age. LTAS and inverse filtering are useful non-invasive tools for determining such characteristics. 

Place, publisher, year, edition, pages
Informa UK Limited, 2022
Keywords
breathiness, glottal adduction, normalised amplitude quotient, Singing voice analysis, voice pedagogy, adduction, adolescent, adult, article, choir (singing), cohort analysis, controlled study, exploratory research, female, filtration, gender, glottis, high school, human, male, New Zealand, noise, nonparametric test, pedagogics, voice, voice analysis
National Category
Music
Identifiers
urn:nbn:se:kth:diva-328933 (URN)10.1080/14015439.2022.2140455 (DOI)000878102700001 ()36322641 (PubMedID)2-s2.0-85141360070 (Scopus ID)
Note

QC 20230613

Available from: 2023-06-13 Created: 2023-06-13 Last updated: 2025-02-21Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-7234-7551

Search in DiVA

Show all publications