Learning product codebooks using vector-quantized autoencoders for image retrieval
2019 (English)In: GlobalSIP 2019 - 7th IEEE Global Conference on Signal and Information Processing, Proceedings, Institute of Electrical and Electronics Engineers Inc. , 2019Conference paper, Published paper (Refereed)
Abstract [en]
Vector-Quantized Variational Autoencoders (VQ-VAE)[1] provide an unsupervised model for learning discrete representations by combining vector quantization and autoencoders. In this paper, we study the use of VQ-VAE for representation learning of downstream tasks, such as image retrieval. First, we describe the VQ-VAE in the context of an information-theoretic framework. Then, we show that the regularization effect on the learned representation is determined by the size of the embedded codebook before the training. As a result, we introduce a hyperparameter to balance the strength of the vector quantizer and the reconstruction error. By tuning the hyperparameter, the embedded bottleneck quantizer is used as a regularizer that forces the output of the encoder to share a constrained coding space. With that, the learned latent features better preserve the similarity relations of the data space. Finally, we incorporate the product quantizer into the bottleneck stage of VQ-VAE and use it as an end-to-end unsupervised learning model for image retrieval tasks. The product quantizer has the advantage of generating large and structured codebooks. Fast retrieval can be achieved by using lookup tables that store the distance between any pair of sub-codewords. State-of-the-art retrieval results are achieved by the proposed codebooks.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc. , 2019.
Keywords [en]
Information theory, Learning systems, Table lookup, Vector quantization, Vectors, Constrained coding, Fast retrievals, Learning products, Reconstruction error, Similarity relations, State of the art, Structured codebooks, Vector quantizers, Image retrieval
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:kth:diva-274151DOI: 10.1109/GlobalSIP45357.2019.8969272ISI: 000555454800086Scopus ID: 2-s2.0-85079284181OAI: oai:DiVA.org:kth-274151DiVA, id: diva2:1445121
Conference
7th IEEE Global Conference on Signal and Information Processing, GlobalSIP 2019, 11 November 2019 through 14 November 2019
Note
QC 20200622
Part of ISBN 9781728127231
2020-06-222020-06-222025-02-07Bibliographically approved