Change search
ReferencesLink to record
Permanent link

Direct link
The Voice Source in Speech Communication - Production and Perception Experiments Involving Inverse Filtering and Synthesis
KTH, Superseded Departments, Speech Transmission and Music Acoustics.
2003 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

This thesis explores, through a number of production andperception studies, the nature of the voice source signal andhow it varies in spoken communication. Research is alsopresented that deals with the techniques and methodologies foranalysing and synthesising the voice source. The main analytictechnique involves interactive inverse filtering for obtainingthe source signal, which is then parameterised to permit thequantification of source characteristics. The parameterisationis carried by means of model matching, using the four-parameterLF model of differentiated glottal flow.

The first three analytic studies focus on segmental andsuprasegmental determinants of source variation. As part of theprosodic variation of utterances, focal stress shows for theglottal excitation an enhancement between the stressed voweland the surrounding consonants. At a segmental level, the voicesource characteristics of a vowel show potentially majordifferences as a function of the voiced/voiceless nature of anadjacent stop. Cross-language differences in the extent anddirectionality of the observed effects suggest differentunderlying control strategies in terms of the timing of thelaryngeal and supralaryngeal gestures, as well as in thelaryngeal tensions settings. Different classes of voicedconsonants also show differences in source characteristics:here the differences are likely to be passive consequences ofthe aerodynamic conditions that are inherent to the consonants.Two further analytic studies present voice source correlatesfor six different voice qualities as defined by Laver'sclassification system. Data from stressed and unstressedcontexts clearly show that the transformation from one voicequality to another does not simply involve global changes ofthe source parameters. As well as providing insights into theseaspects of speech production, the analytic studies providequantitative measures useful in technology applications,particularly in speech synthesis.

The perceptual experiments use the LF source implementationin the KLSYN88 synthesiser to test some of the analytic resultsand to harness them to explore the paralinguistic dimension ofspeech communication. A study of the perceptual salience ofdifferent parameters associated with breathy voice indicatesthat the source spectral slope is critically important andthat, surprisingly, aspiration noise contributes relativelylittle. Further perceptual tests using stimuli with differentvoice qualities explore the mapping between voice quality andits paralinguistic function of expressing emotion, mood andattitude. The results of these studies highlight the crucialrole of voice quality in expressing affect as well as providingpointers to how it combines withf0for this purpose.

The last section of the thesis focuses on the techniquesused for the analysis and synthesis of the source. Asemi-automatic method for inverse filtering is presented, whichis novel in that it optimises the inverse filter by exploitingthe knowledge that is typically used by the experimenter whencarrying out manual interactive inverse filtering. A furtherstudy looks at the properties of the modified LF model in theKLSYN88 synthesiser: it highlights how it differs from thestandard LF model and discusses the implications forsynthesising the glottal source signal from LF model data.Effective and robust source parameterisation for the analysisof voice quality is the topic of the final paper: theeffectiveness of global, amplitude-based, source parameters isexamined across speech tokens with large differences inf0. Additional amplitude-based parameters areproposed to enable a more detailed characterisation of theglottal pulse.

Keywords:Voice source dynamics, glottal sourceparameters, source-filter interaction, voice quality,phonation, perception, affect, emotion, mood, attitude,paralinguistic, inverse filtering, knowledge-based, formantsynthesis, LF model, fundamental frequency,f0.

Place, publisher, year, edition, pages
Institutionen för talöverföring och musikakustik , 2003. , xi, 80 p.
Trita-TMH, 03:13
Keyword [en]
Voice source dynamics, glottal source parameters, source-filter interaction, voice quality, phonation, perception, affect, emotion, mood, attitude, paralinguistic, inverse filtering, knowledge-based, formant synthesis, LF model, fundamental frequency
URN: urn:nbn:se:kth:diva-3665ISBN: 91-7283-630-XOAI: diva2:9500
Public defence
NR 20140805Available from: 2003-12-05 Created: 2003-12-05Bibliographically approved

Open Access in DiVA

fulltext(2234 kB)1950 downloads
File information
File name FULLTEXT01.pdfFile size 2234 kBChecksum SHA-1
Type fulltextMimetype application/pdf

By organisation
Speech Transmission and Music Acoustics

Search outside of DiVA

GoogleGoogle Scholar
Total: 1950 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 1073 hits
ReferencesLink to record
Permanent link

Direct link