Vocal tract length compensation in the signal and model domains in child speech recognition
2007 (English)In: Proceedings of Fonetik: TMH-QPSR, 2007, 41-44 p.Conference paper (Other academic)
In a newly started project, KOBRA, we study methods to reduce the required amount of training data for speech recognition by combining the conventional data-driven training approach with available partial knowledge on speech production, implemented as transformation functions in the acoustic, articulatory and speaker characteristic domains. Initially, we investigate one well-known dependence, the inverse proportional relation between vocal tract length and formant frequencies. In this report, we have replaced the conventional technique of frequency warping the unknown input utterance (VTLN) by transforming the training data instead. This enables phoneme-dependent warping to be performed. In another experiment, we expanded the available training data by duplicating each training utterance into a number of differently warped instances. Training on this expanded corpus results in models, each one representing the whole range of vocal tract length variation. This technique allows every frame of the utterance to be warped differently. The computational load is reduced by an order of magnitude compared to conventional VTLN without notice- able decrease in performance on the task of recognising children’s speech using models trained on adult speech.
Place, publisher, year, edition, pages
2007. 41-44 p.
Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:kth:diva-12333OAI: oai:DiVA.org:kth-12333DiVA: diva2:309765
Proceedings of Fonetik, 2007, 30 maj - 1 juni Sal F2, Lindstedtsvägen 26, KTH, Stockholm
QC 201105022010-04-092010-04-082016-05-23Bibliographically approved