Selecting static and dynamic features using an advanced auditory model for speech recognition
2010 (English)In: Proceedings 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE , 2010, 4590-4593 p.Conference paper (Refereed)
We describe a method to select features for speech recognition that is based on a quantitative model of the human auditory periphery. The method maximizes the similarity of the geometry of the space spanned by the subset of features and the geometry of the space spanned by the auditory model output. The selection method uses a spectro-temporal auditory model that captures both frequency- and time-domain masking. The selection method is blind to the meaning of speech and does not require annotated speech data. We apply the method to the selection of a subset of features from a conventional set consisting of mel cepstra and their first-order and second-order time derivatives. Although our method uses only knowledge of the human auditory periphery, the experimental results show that it performs significantly better than feature-reduction algorithms based on linear and heteroscedastic discriminant analysis that require training with annotated speech data.
Place, publisher, year, edition, pages
IEEE , 2010. 4590-4593 p.
, Proceedings of the IEEE international conference on acoustics, speech and signal processing, ISSN 1520-6149
feature selection, dimension reduction, auditory model, perception, sensitivity analysis, distortion, speech recognition
IdentifiersURN: urn:nbn:se:kth:diva-11469DOI: 10.1109/ICASSP.2010.5495648ISI: 000287096004068ScopusID: 2-s2.0-78049406665ISBN: 978-1-4244-4296-6OAI: oai:DiVA.org:kth-11469DiVA: diva2:276953
2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, March 14–19, 2010, Dallas, Texas, U.S.A.
QC 20110415; WINNER OF THE BEST STUDENT PAPER AWARD (1st place).2009-11-132009-11-132012-09-14Bibliographically approved