Signal processing representations of speech
2003 (English)In: IEICE transactions on information and systems, ISSN 0916-8532, E-ISSN 1745-1361, Vol. E86D, no 3, 359-376 p.Article, review/survey (Refereed) Published
Synergies in processing requirements and knowledge of human speech production and perception have led to a similarity of the speech signal representations used for the tasks of recognition, coding, and modification. The representations are generally composed of a description of the vocal-tract transfer function and, in the case of coding and modification, a description of the excitation signal. This paper provides an overview of commonly used representations. For coding and modification, autoregressive models represented by line spectral frequencies perform well for the vocal tract, and pitch-synchronous filter banks and modulation-domain filters perform well for the excitation. For recognition, good representations are based on a smoothed magnitude response of the vocal tract.
Place, publisher, year, edition, pages
2003. Vol. E86D, no 3, 359-376 p.
speech, features, representation, warped frequency scale, linear prediction, word recognition, cepstral coefficients, voiced speech, spectrum, enhancement, sounds, noise
IdentifiersURN: urn:nbn:se:kth:diva-22303ISI: 000181421800002OAI: oai:DiVA.org:kth-22303DiVA: diva2:341001
QC 201005252010-08-102010-08-10Bibliographically approved