Creating unseen triphones by phone concatenation in the spectral, cepstral and formant domains
1997 (English)Conference paper (Refereed)
A technique for predicting triphones by concatenation of diphone or monophone models is studied. The models are connected using linear interpolation between endpoints of piece-wise linear parameter trajectories. Three types of spectral representation are compared: formants, filter amplitudes and cepstmm coefficients. The proposed technique lowers the spectral distortion of the phones for all three representations when different speakers are used for training and evaluation. The average error of the created triphones is lower in the filter and cepstmm domains than for formants. This is explained to be caused by limitations in the Analysis-bySynthesis formant tracking algorithm. A small improvement with the proposed technique is achieved for all representations in the task of reordering N-best sentence recognition candidate lists.
Place, publisher, year, edition, pages
1997. 41-44 p.
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-91231OAI: oai:DiVA.org:kth-91231DiVA: diva2:508924
Proc of Fonetik -97, Dept of Phonetics, Umeå Univ
NR 201408052012-03-112012-03-11Bibliographically approved