Units for Dynamic Vocal Tract Length Normalization
(English)Manuscript (preprint) (Other academic)
A novel method to account for dynamic speaker characteristic properties in a speech recognition system is presented. The estimated trajectory of a property can be constrained to be constant or to have a limited rate-of-change within a phone or a sub-phone state, or be allowed to change between individual speech frames. The constraints are implemented by extending each state in the HMM by a number of property-specific sub-states transformed from the original model. The connections in the transition matrix of the extended model define possible slopes of the trajectory. Constraints on its dynamic range during an utterance are implemented by decomposing the trajectory into a static and a dynamic component. Results are presented on vocal tract length normalization in connected-digit recognition of children's speech using models trained on male adult speech. The word error rate was reduced compared with the conventional utterance-specific warping factor by 10% relative.
speech recognition, VTLN, dynamic modelling
Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:kth:diva-12242OAI: oai:DiVA.org:kth-12242DiVA: diva2:306690
QC 201105022010-04-082010-03-302011-05-02Bibliographically approved