Change search
ReferencesLink to record
Permanent link

Direct link
Modelling Phone-Level Pronunciation in Discourse Context
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
2006 (English)Doctoral thesis, monograph (Other scientific)
Abstract [en]

Analytic knowledge about the systematic variation in a language has an important place in the description of the language. Such knowledge is interesting e.g. in the language teaching domain, as a background for various types of linguistic studies, and in the development of more dynamic speech technology applications. In previous studies, the effects of single variables or relatively small groups of related variables on the pronunciation of words have been studied separately. The work described in this thesis takes a holistic perspective on pronunciation variation and focuses on a method for creating general descriptions of phone-level pronunciation in discourse context. The discourse context is defined by a large set of linguistic attributes ranging from high-level variables such as speaking style, down to the articulatory feature level. Models of phone-level pronunciation in the context of a discourse have been created for the central standard Swedish language variety. The models are represented in the form of decision trees, which are readable for both machines and humans. A data-driven approach was taken for the pronunciation modelling task, and the work involved the annotation of recorded speech with linguistic and related information. The decision tree models were induced from the annotation. An important part of the work on pronunciation modelling was also the development of a pronunciation lexicon for Swedish. In a cross-validation experiment, several sets of pronunciation models were created with access to different parts of the attributes in the annotation. The prediction accuracy of pronunciation models could be improved by 42.2% by making information from layers above the phoneme level accessible during model training. Optimal models were obtained when attributes from all layers of annotation were used. The goal for the models was to produce pronunciation representations representative for the language variety and not necessarily for the individual speakers, on whose speech the models were trained. In the cross-validation experiment, model-produced phone strings were compared to key phonetic transcripts of actual speech, and the phone error rate was defined as the share of discrepancies between the respective phone strings. Thus, the phone error rate is the sum of actual errors and discrepancies resulting from desired adaptations from a speaker-specific pronunciation to a pronunciation reflecting general traits of the language variety. The optimal models gave an average phone error rate of 8.2%.

Place, publisher, year, edition, pages
Stockholm: KTH , 2006. , ix, 250 p.
Trita-CSC-A, ISSN 1653-5723 ; 2006:25
Keyword [en]
Pronunciation modelling, Pronunciation variation, Discourse-context, Phone-level variation, Central standard Swedish, Spoken language annotation, Data-driven methods, Machine learning, Decision trees, Pronunciation lexicon development, Machine-readable lexicon, Phonology, Discourse, Lexicon
National Category
Language Technology (Computational Linguistics)
URN: urn:nbn:se:kth:diva-4202ISBN: 91-7178-490-XOAI: diva2:11212
Public defence
2006-12-11, F3, Lindstedtsvägen 26, Stockholm, 13:00
QC 20100901Available from: 2006-12-04 Created: 2006-12-04 Last updated: 2010-09-01Bibliographically approved

Open Access in DiVA

fulltext(1641 kB)984 downloads
File information
File name FULLTEXT01.pdfFile size 1641 kBChecksum SHA-1
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Jande, Per-Anders
By organisation
Speech, Music and Hearing, TMH
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 984 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 1070 hits
ReferencesLink to record
Permanent link

Direct link