Spoken language annotation and data-driven modelling of phone-level pronunciation in discourse context
2008 (English)In: Speech Communication, ISSN 0167-6393, Vol. 50, no 2, 126-141 p.Article in journal (Refereed) Published
A detailed description of the discourse context of a word can be used for predicting word pronunciation in discourse context and also enables studies of the interplay between various types of information on e.g. phone-level pronunciation. The work presented in this paper is aimed at modelling systematic variation in the phone-level realisation of words inherent to a language variety. A data-driven approach based on access to detailed discourse context descriptions is used. The discourse context descriptions are constructed through annotation of spoken language with a large variety of linguistic and related variables in multiple layers. Decision tree pronunciation models are induced from the annotation. The effects of using different types and different amounts of information for model induction are explored. Models generated in a tenfold cross-validation experiment produce on average 8.2% errors on the phone level when they are trained on all available information. Models trained on phoneme level information only have an average phone error rate of 14.2%. This means that including information above the phoneme level in the context description can improve model performance by 42.2%.
Place, publisher, year, edition, pages
2008. Vol. 50, no 2, 126-141 p.
spoken language annotation, pronunciation variation, pronunciation modelling, decision trees
IdentifiersURN: urn:nbn:se:kth:diva-33384DOI: 10.1016/j.specom.2007.07.004ISI: 000253328000004ScopusID: 2-s2.0-37649014913OAI: oai:DiVA.org:kth-33384DiVA: diva2:414904
QC 201105052011-05-052011-05-052011-05-05Bibliographically approved