Inducing Decision Tree Pronunciation Variation Models from Annotated Speech Data
2005 (English)In: 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 2005, 1945-1948 p.Conference paper (Refereed)
Amodelofpronunciationof words in discourse context has been induced from the annotation of a spoken language corpus. The information included in the annotation is a set of variables hypothesised to be important for thepronunciationof words in discourse context. The annotation is connected to segmentally defined units on tiers corresponding to linguistically relevant units: the discourse, the utterance, the phrase, the word, the syllable and the phoneme. Themodelis represented as atreestructure, making it transparent for analysis and easy to use in aspeechsynthesis system. Using phonemic canonicalpronunciationrepresentations to estimate the segmental string of theannotateddatagives a 22.1% phone error rate.Decisiontreepronunciationvariationmodelsgenerated in a tenfold cross validation procedure showed an average phone error rate of 9.9%. Using multiple context variables for modellingpronunciationvariationcould thus reduce the error rate by 55%, compared to a baseline using canonicalpronunciationrepresentations.
Place, publisher, year, edition, pages
Lisbon, Portugal, 2005. 1945-1948 p.
Computer Science Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:kth:diva-51873ScopusID: 2-s2.0-33745220964OAI: oai:DiVA.org:kth-51873DiVA: diva2:465167
9th European Conference on Speech Communication and Technology; Lisbon; 4 September 2005 through 8 September 2005
tmh_import_11_12_14. QC 201202012011-12-142011-12-142012-02-01Bibliographically approved