kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Hierarchical Residual Learning Based Vector Quantized Variational Autoencoder for Image Reconstruction and Generation
Norwegian University of Science and Technology Trondheim, Norway; Monash University Melbourne, Australia.
Monash University Melbourne, Australia.
Norwegian University of Science and Technology Trondheim, Norway.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH. Norwegian University of Science and Technology Trondheim, Norway.ORCID iD: 0000-0002-3323-5311
2022 (English)In: BMVC 2022 - 33rd British Machine Vision Conference Proceedings, British Machine Vision Association (BMVA) , 2022Conference paper, Published paper (Refereed)
Abstract [en]

We propose a multi-layer variational autoencoder method, we call HR-VQVAE, that learns hierarchical discrete representations of the data. By utilizing a novel objective function, each layer in HR-VQVAE learns a discrete representation of the residual from previous layers through a vector quantized encoder. Furthermore, the representations at each layer are hierarchically linked to those at previous layers. We evaluate our method on the tasks of image reconstruction and generation. Experimental results demonstrate that the discrete representations learned by HR-VQVAE enable the decoder to reconstruct high-quality images with less distortion than the baseline methods, namely VQVAE and VQVAE-2. HR-VQVAE can also generate high-quality and diverse images that outperform state-of-the-art generative models, providing further verification of the efficiency of the learned representations. The hierarchical nature of HR-VQVAE i) reduces the decoding search time, making the method particularly suitable for high-load tasks and ii) allows to increase the codebook size without incurring the codebook collapse problem.

Place, publisher, year, edition, pages
British Machine Vision Association (BMVA) , 2022.
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-350328Scopus ID: 2-s2.0-85165971275OAI: oai:DiVA.org:kth-350328DiVA, id: diva2:1883799
Conference
33rd British Machine Vision Conference Proceedings, BMVC 2022, London, United Kingdom, Nov 21 2022 - Nov 24 2022
Note

QC 20240711

Available from: 2024-07-11 Created: 2024-07-11 Last updated: 2024-07-11Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Salvi, Giampiero

Search in DiVA

By author/editor
Salvi, Giampiero
By organisation
Speech, Music and Hearing, TMH
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 81 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf