Training production parameters of context-dependent phones for speech recognition
1994 (English)In: STL-QPSR, Vol. 35, no 1, 59-90 p.Article in journal (Refereed) Published
A representation form of acoustic information in a trained phone library at the production parametric as well as the spectral level is described. The phones are trained in the parametric domain and are transformed to the spectral domain by means of a synthesis procedure. By this twofold description, potentially more powerful procedures for speaker adaptation and generation of unseen triphones can be explored, while the more robust spectral representation can be used for recognition. Context-dependent phones are represented by control parameters to a cascade formant synthesiser. During training, the parameters are extracted using an analysis-by-synthesis technique and the trajectories are approximated by piece-wise linear segments. For recognition, the parameter tracks are transformed to a sequence of spectral subphone states, similar to a Hidden Markov model. Recognition is performed by Viterbi search in a finitestate network. Recognition experiments have been performed on Swedish connected-digit strings pronounced by seven male speakers. In one experiment, unseen triphones were created by concatenating monophones and diphones and interpolating the parameter trajectories between line endpoints. In another, speaker adaptation was based on generalisation of dzflerences of observed triphones from the phone library. With optimum weighting of duration information, the results for cross-speaker recognition, speaker adaptation, and multi-speaker training were 98.5%, 98.9% and 99.1% correct digit recognition, respectively. Preliminary experiments with created unseen triphones show no improvement. In informal listening tests of resynthesised digit strings from concatenation of trained triphones, the speech has been judged as intelligible, however, far from natural.
Place, publisher, year, edition, pages
1994. Vol. 35, no 1, 59-90 p.
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-91237OAI: oai:DiVA.org:kth-91237DiVA: diva2:508934
NR 201408052012-03-112012-03-11Bibliographically approved