Change search
ReferencesLink to record
Permanent link

Direct link
Training speech synthesis parameters of allophones for speech recognition
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
1994 (English)Conference paper (Refereed)
Abstract [en]

A technique for training a speech recognition system at a production parametric level is described. The approach offers potential advantages in the form of small training corpora and fast speaker adaptation. Triphones that have not occurred in the training data can be generated by concatenation and parametric interpolation of diphones or context-free phones. The triphones are represented by a piece-wise linear approximation of the production parameters. For recognition, these are converted to subphone spectral state sequences. A 97.6% connected-digit recognition rate has been achieved when training the system on one male speaker and performing recognition on 6 other male speakers. In preliminary experiments with generation of unseen triphones, the performance is still slightly lower compared to using seen diphones and context-free phones. Experiments with fast speaker adaptation is also going on. Resynthesis of speech by concatenating triphones has been used to verify the quality of the triphone library.

Place, publisher, year, edition, pages
1994. 18-21 p.
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-91238OAI: diva2:508935
Working papers from the 8th Swedish Phonetics Conference
NR 20140805Available from: 2012-03-11 Created: 2012-03-11Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Blomberg, Mats
By organisation
Speech, Music and Hearing, TMH
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 10 hits
ReferencesLink to record
Permanent link

Direct link