Change search
ReferencesLink to record
Permanent link

Direct link
A common phone model representation for speech recognition and synthesis
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
1994 (English)Conference paper (Refereed)
Abstract [en]

A combined representation of context-dependent phones at the production parametric and the spectral level is described. The phones are trained in the production domain using analysis-by-synthesis and piece-wise linear approximation of parameter trajectories. For recognition, this representation is transformed to spectral subphones, using a cascade formant synthesis procedure. In a connected-digit recognition task, 99.1% average correct digit rate was achieved in a group of seven male speakers when, for each test speaker, training was done on the other six speakers. Simple rules for male-to-female transformation of the male phone library increased the performance for six female speakers from 88.9% without transformation to 96.3%. In informal listening tests of resynthesised digit strings, the speech has been judged as intelligible, however far from natural.


Place, publisher, year, edition, pages
1994. 1875-1878 p.
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-91236OAI: diva2:508933
Third International Conference on Spoken Language Processing (ICSLP 94)
NR 20140805Available from: 2012-03-11 Created: 2012-03-11Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Blomberg, Mats
By organisation
Speech, Music and Hearing, TMH
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 14 hits
ReferencesLink to record
Permanent link

Direct link