Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Auditory Model-Based Design and Optimization of Feature Vectors for Automatic Speech Recognition
KTH, School of Electrical Engineering (EES), Communication Theory. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.ORCID iD: 0000-0003-2638-6047
KTH, School of Electrical Engineering (EES), Sound and Image Processing.
2011 (English)In: IEEE Transactions on Audio, Speech, and Language Processing, ISSN 1558-7916, E-ISSN 1558-7924, Vol. 19, no 6, 1813-1825 p.Article in journal (Refereed) Published
Abstract [en]

Using spectral and spectro-temporal auditory models along with perturbation-based analysis, we develop a new framework to optimize a feature vector such that it emulates the behavior of the human auditory system. The optimization is carried out in an offline manner based on the conjecture that the local geometries of the feature vector domain and the perceptual auditory domain should be similar. Using this principle along with a static spectral auditory model, we modify and optimize the static spectral mel frequency cepstral coefficients (MFCCs) without considering any feedback from the speech recognition system. We then extend the work to include spectro-temporal auditory properties into designing a new dynamic spectro-temporal feature vector. Using a spectro-temporal auditory model, we design and optimize the dynamic feature vector to incorporate the behavior of human auditory response across time and frequency. We show that a significant improvement in automatic speech recognition (ASR) performance is obtained for any environmental condition, clean as well as noisy.

Place, publisher, year, edition, pages
2011. Vol. 19, no 6, 1813-1825 p.
Keyword [en]
Auditory models, mel frequency cepstral coefficient (MFCC), speech recognition
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-38976DOI: 10.1109/TASL.2010.2101597ISI: 000293702300030Scopus ID: 2-s2.0-85008045118OAI: oai:DiVA.org:kth-38976DiVA: diva2:438758
Funder
ICT - The Next Generation
Available from: 2011-09-05 Created: 2011-09-05 Last updated: 2017-12-08Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Authority records BETA

Chatterjee, Saikat

Search in DiVA

By author/editor
Chatterjee, SaikatKleijn, W. Bastiaan
By organisation
Communication TheoryACCESS Linnaeus CentreSound and Image Processing
In the same journal
IEEE Transactions on Audio, Speech, and Language Processing
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 80 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf