Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An Analysis of Shallow and Deep Representations of Speech Based on Unsupervised Classification of Isolated Words
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-3323-5311
2016 (English)In: Recent Advances in Nonlinear Speech Processing, Springer, 2016, Vol. 48, p. 151-157Conference paper, Published paper (Refereed)
Abstract [en]

We analyse the properties of shallow and deep representa-tions of speech. Mel frequency cepstral coefficients (MFCC) are compared to representations learned by a four layer Deep Belief Network (DBN) in terms of discriminative power and invariance to irrelevant factors such as speaker identity or gender. To avoid the influence of supervised statistical modelling, an unsupervised isolated word classification task is used for the comparison. The deep representations are also obtained with unsupervised training (no back-propagation pass is performed). The results show that DBN features provide a more concise clustering and higher match between clusters and word categories in terms of adjusted Rand score. Some of the confusions present with the MFCC features are, however, retained even with the DBN features.

Place, publisher, year, edition, pages
Springer, 2016. Vol. 48, p. 151-157
Series
Smart Innovation Systems and Technologies, ISSN 2190-3018 ; 48
Keyword [en]
Deep learning, Representations, Hierarchical clustering
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-180414DOI: 10.1007/978-3-319-28109-4_15ISI: 000417253600015Scopus ID: 2-s2.0-84955471729ISBN: 978-3-319-28109-4 (print)ISBN: 978-3-319-28107-0 (print)OAI: oai:DiVA.org:kth-180414DiVA, id: diva2:893723
Conference
7th International Workshop on Nonlinear Speech Processing (NOLISP), May 18-20, 2015, Vietri sul Mare, Italy
Note

QC 20160615

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2018-01-10Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

Salvi, Giampiero

Search in DiVA

By author/editor
Salvi, Giampiero
By organisation
Speech Communication and Technology
Computer SciencesLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 334 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf