Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Computer methods for voice analysis
KTH, Superseded Departments, Speech, Music and Hearing.ORCID iD: 0000-0003-4129-9793
2003 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

This thesis consists of five articles and a summary. Thethesis deals with methods for measuring properties of thevoice. The methods are all computer-based, but utilisedifferent approaches for measuring different aspects of thevoice.

Paper I introduces the Visual Sort and Rate (VSR) method forperceptual rating of voice quality. The method is based on theVisual Analogue Scale (VAS), but simultaneously shows allstimuli as icons along the VAS on the computer screen. As thelistener places similar-sounding stimuli close to each otherduring the rating process, comparing stimuli becomeseasier.

Paper II introduces the correlogram. Fundamental frequencyF0 sometimes cannot be strictly defined, particularly forperturbed voice signals. The method displays multipleconsecutive correlation functions in a grey scale image. Thus,the correlogram avoids selecting a single F0 value. Rather itpresents an unbiased image of periodicity, allowing theinvestigator to select among several candidates, ifappropriate.

PaperIII introduces a method for detection of phonation tobe utilised in voice accumulators. The method uses twomicrophones attached near the subject’s ears. Phase andamplitude relations of the microphone signals are used to forma phonation detector. The output of the method can be used tomeasure phonation time, speaking time and fundamental frequencyof the subject, as well as sound pressure level of both thesubject’s voicing and the ambient sounds.

Paper IV introduces a method for Fourier analysis ofhigh-speed laryngoscopic imaging. The data from the consecutiveimages are re-arranged to form time-series that reflect thetime-variation of light intensity in each pixel. Each of thesetime series is then analysed by means of Fouriertransformation, such that a spectrum for each pixel isobtained. Several ways of displaying these spectra aredemonstrated.

Paper V examines a test set-up for simultaneous recording ofairflow, intra-oral pressure, electro-glottography, audio andhigh-speed imaging. Data are analysed with particular focus onsynchronisation between glottal area and inverse filteredairflow. Several methodological aspects are also examined, suchas the difficulties in synchronising high-speed imaging datawith the other signals.

Place, publisher, year, edition, pages
Stockholm: KTH , 2003. , 22 p.
Series
Trita-TMH, 2003:2
Keyword [en]
voice analysis, perceptual analysis, fundamental frequency, correlogram, aperiodicity, Fourier analysis, high-speed imaging, laryngoscopy, vocal fold vibration, voice accumulation.
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:kth:diva-3485ISBN: 91-7283-461-7 (print)OAI: oai:DiVA.org:kth-3485DiVA: diva2:9289
Public defence
2003-03-28, 00:00
Note
QC 20100609Available from: 2003-03-21 Created: 2003-03-21 Last updated: 2010-06-09Bibliographically approved
List of papers
1. The visual sort and rate method for perceptual evaluation in listening tests
Open this publication in new window or tab >>The visual sort and rate method for perceptual evaluation in listening tests
2003 (English)In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 28, no 3, 109-116 p.Article in journal (Refereed) Published
Abstract [en]

This paper introduces the Visual Sort and Rate (VSR) method which can be utilized for perceptual rating of sound stimuli. The method facilitates comparing similar stimuli, thus making the rank ordering of the stimuli easier. To examine the potential benefits of the method, it was compared with two other methods for perceptual rating of audio stimuli. The first method was a straightforward computer-based implementation of a visual analogue scale (VAS) allowing multiple playbacks and re-play of previously heard stimuli (C-VAS). The second method utilized a VAS where the responses were given on paper (P-VAS). The three methods were compared by using two sets of stimuli. The first set was a synthetically generated series of stimuli mimicking the vowel /a/ with different spectral tilts. In this test, a single parameter was rated. The second set of stimuli was a naturally spoken voice. For this set of stimuli three parameters were rated. Results show that the VSR method gave better reliability of the subjects' ratings in the single-parameter tests: Pearson and Spearman correlation coefficients were significantly higher for the VSR method than for the other methods. For the multi-parameter, intra-subject test, significantly higher Pearson correlation coefficients were found for the VSR method than for the VAS on paper.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13260 (URN)10.1080/14015430310015255 (DOI)
Note
QC 20100609Available from: 2010-06-09 Created: 2010-06-09 Last updated: 2017-12-12Bibliographically approved
2. The correlogram: A visual display of periodicity
Open this publication in new window or tab >>The correlogram: A visual display of periodicity
2003 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 114, no 5, 2934-2945 p.Article in journal (Refereed) Published
Abstract [en]

Fundamental frequency (F-0) extraction is often used in voice quality analysis'. In pathological voices with a high degree of instability in F-0, it is common for F-0 extraction algorithms to fail. In such cases, the faulty F-0 values might spoil the possibilities for further data analysis. This paper presents the correlogram, a new method of displaying periodicity. The correlogram is based on the waveform-matching techniques often used in F-0 extraction programs, but with no mechanism to select an actual F-0 value. Instead, several candidates for F-0 are shown as dark bands. The result is presented as a 3D plot with time on the x axis, correlation delay inverted to frequency on the y axis, and correlation on the z axis. The z axis is represented in a gray scale as in a spectrogram. Delays corresponding to integer multiples, of the period time will receive high correlation, thus resulting in candidates at F-0, F-0/2, F-0/3, etc. While the correlogram, adds little to F-0 analysis of normal voices, it is useful for analysis of pathological voices since it illustrates the full. complexity of the periodicity in the voice signal. Also, in combination with manual tracing, the correlogram can be used for semimanual F-0 extraction. If so, F-0 extraction can be performed on many voices that cause problems for conventional F-0 extractors. To demonstrate the properties of the method it is applied to synthetic and natural voices, among them six pathological voices, which are characterized by roughness, vocal fry, gratings/scrape, hypofunctional breathiness and voice breaks, or combinations of these.

Keyword
VOICE QUALITY, ACOUSTIC CHARACTERISTICS, PATHOLOGICAL VOICE, ROUGH VOICE
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13261 (URN)10.1121/1.1590972 (DOI)000186489100038 ()
Note
QC 20100609Available from: 2010-06-09 Created: 2010-06-09 Last updated: 2017-12-12Bibliographically approved
3. The self-to-other ratio applied as a phonation detector for voice accumulation
Open this publication in new window or tab >>The self-to-other ratio applied as a phonation detector for voice accumulation
2003 (English)In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 28, no 2, 71-80 p.Article in journal (Refereed) Published
Abstract [en]

A new method for phonation detection is presented. The method utilises two microphones attached near the subject's ears. Simplified, phonation is assumed to occur when the signals appear mainly in-phase and at equal amplitude. Several signal processing steps are added in order to improve the phonation detection, and finally the original signal is sorted in separate channels corresponding to the phonated and non-phonated instances. The method is tested in a laboratory setting to demonstrate the need for some of the stages of the signal processing and to examine the processing speed. The resulting sound file allows for measurement of phonation time, speaking time and fundamental frequency of the subject and sound pressure level of the subject's voice and the environmental sounds separately. The present implementation gives great freedom for adjustment of analysis parameters, since the microphone signals are recorded on DAT tape and the processing is performed off-line on a PC. In future versions, a voice accumulator based on this principle could be designed in order to shorten analysis time and thus make the method more appropriate for clinical use.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13263 (URN)10.1080/14015430310011772 (DOI)
Note
QC 20100609Available from: 2010-06-09 Created: 2010-06-09 Last updated: 2017-12-12Bibliographically approved
4. A method of applying Fourier analysis to high-speed laryngoscopy
Open this publication in new window or tab >>A method of applying Fourier analysis to high-speed laryngoscopy
2001 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 110, no 6, 3193-3197 p.Article in journal (Refereed) Published
Abstract [en]

A new method for analysis of digital high-speed recordings of vocal-fold vibrations is presented. The method is based on the extraction of light-intensity time sequences from consecutive images, which in turn are Fourier transformed. The spectra thus acquired can be displayed in four different modes, each having its own benefits. When applied to the larynx, the method visualizes oscillations in the entire laryngeal area, not merely the glottal region. The method was applied to two laryngoscopic high-speed image sequences. Among these examples, covibrations in the ventricular folds and in the mucosa covering the arytenoid cartilages were found. In some cases the covibrations occurred at other frequencies than those of the glottis.

Keyword
VOCAL FOLD VIBRATIONS, MATHEMATICAL-MODEL, KYMOGRAPHY, CORDS
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13262 (URN)000172731600037 ()
Note
QC 20100609Available from: 2010-06-09 Created: 2010-06-09 Last updated: 2017-12-12Bibliographically approved
5. Simultaneous analysis of vocal fold vibration and transglottal airflow: exploring a new experimental setup
Open this publication in new window or tab >>Simultaneous analysis of vocal fold vibration and transglottal airflow: exploring a new experimental setup
2003 (English)In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 17, 319-330 p.Article in journal (Refereed) Published
Abstract [en]

Summary: The purpose of this study was to develop an analysis system for studying the relationship between vocal fold vibration and the associated transglottal airflow. Recordings of airflow, electroglottography (EGG), oral air pressure, and acoustic signals were performed simultaneously with high-speed imaging at a rate of approximately 1900 frames/s. Inverse filtered airflow is compared with the simultaneous glottal area extracted from the high-speed image sequence. The accuracy of the synchronization between the camera images and the foot pedal synchronization pulse was examined, showing that potential synchronization errors increase with time distance to the synchronization pulse. Therefore, analysis was limited to material near the synchronization pulse. Results corroborate previous predictions that air flow lags behind area, but also they reveal that relationships between these two entities may be complex and apparently varying with phonation mode.

Keyword
High-speed video, Lissajou figures, Glottal airflow, Glottal area, Inverse filtering, Pixel-based Fourier analysis
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-13264 (URN)10.1067/S0892-1997(03)00070-5 (DOI)
Note
QC 20100609Available from: 2010-06-09 Created: 2010-06-09 Last updated: 2017-12-12Bibliographically approved

Open Access in DiVA

fulltext(345 kB)648 downloads
File information
File name FULLTEXT01.pdfFile size 345 kBChecksum MD5
425fb3f652faa016b5e3e95e5e63221f9596b764bf78d5f8bcf3f9bcfbed9f098ef60232
Type fulltextMimetype application/pdf

Authority records BETA

Granqvist, Svante

Search in DiVA

By author/editor
Granqvist, Svante
By organisation
Speech, Music and Hearing
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 648 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 734 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf