Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning from images and speech with non-negative matrix factorization enhanced by input space scaling
KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
2010 (English)In: 2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings, IEEE , 2010, 1-6 p.Conference paper, Published paper (Refereed)
Abstract [en]

Computional learning from multimodal data is often done with matrix factorization techniques such as NMF (Non-negative Matrix Factorization), pLSA (Probabilistic Latent Semantic Analysis) or LDA (Latent Dirichlet Allocation). The different modalities of the input are to this end converted into features that are easily placed in a vectorized format. An inherent weakness of such a data representation is that only a subset of these data features actually aids the learning. In this paper, we first describe a simple NMF-based recognition framework operating on speech and image data. We then propose and demonstrate a novel algorithm that scales the inputs of this framework in order to optimize its recognition performance.

Place, publisher, year, edition, pages
IEEE , 2010. 1-6 p.
Keyword [en]
Feature selection, Image recognition, Machine learning, Multi-modal learning, Vocabulary acquisition
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-150006DOI: 10.1109/SLT.2010.5700813Scopus ID: 2-s2.0-79951807213ISBN: 978-142447903-0 (print)OAI: oai:DiVA.org:kth-150006DiVA: diva2:743994
Conference
2010 IEEE Workshop on Spoken Language Technology, SLT 2010, 12 December 2010 through 15 December 2010, Berkeley, CA, United States
Note

QC 20140905

Available from: 2014-09-05 Created: 2014-08-29 Last updated: 2014-09-05Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Kleijn, W. Bastiaan
By organisation
Sound and Image Processing (Closed 130101)
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 35 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf