Change search
ReferencesLink to record
Permanent link

Direct link
A critical examination of deep learningapproaches to automated speech recognition
KTH, School of Computer Science and Communication (CSC).
2014 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Recently, deep learning techniques have been successfully applied to automatic speech recognition (ASR) tasks. Most current speech recognition systems use Hidden Markov Models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) are exploited to model the emission probability of the HMM. Deep Neural Networks (DNNs) and Deep Belief Networks(DBNs) have recently proven though to outperform GMMs in modeling the probability of emission in HMMs. Deep architectures such as DBNs with many hidden layers are useful for multilevel feature representation thus building a distributed representation at different levels of a certain input. These networks are first pre-trained as a multi-layer generative model of a window of feature vector without making use of any discriminative information in unsupervised mode. Once the generative pre-training is complete, discriminative fine-tuning is performed to adjust the model parameters to make them better at predicting. Our aim is to study different levels of representation for speech acoustic features that are produced by the hidden layers of DBNs. To this end, we estimate phoneme recognition error and use classification accuracy evaluated with Support Vector Machines (SVMs) as a measure of separability between the DBN representations of 61 phoneme classes. In addition, we investigate the relation between different subgroups/categories of phonemes at various representation levels using correlation analysis. The tests have been performed on TIMIT database and simulations have been developed to run on a graphics processing unit (GPU) cluster at PDC/KTH.

Place, publisher, year, edition, pages
National Category
Computer Science
URN: urn:nbn:se:kth:diva-153681OAI: diva2:753291
Available from: 2014-11-25 Created: 2014-10-07 Last updated: 2014-11-25Bibliographically approved

Open Access in DiVA

fulltext(5130 kB)564 downloads
File information
File name FULLTEXT01.pdfFile size 5130 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 564 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 32891 hits
ReferencesLink to record
Permanent link

Direct link