Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Visual instance retrieval with deep convolutional networks
KTH, Skolan för datavetenskap och kommunikation (CSC), Robotik, perception och lärande, RPL.
KTH.
KTH.
KTH, Skolan för datavetenskap och kommunikation (CSC), Robotik, perception och lärande, RPL.ORCID-id: 0000-0002-4266-6746
2016 (Engelska)Ingår i: ITE Transactions on Media Technology and Applications, ISSN 2186-7364, Vol. 4, nr 3, s. 251-258Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

This paper provides an extensive study on the availability of image representations based on convolutional networks (ConvNets) for the task of visual instance retrieval. Besides the choice of convolutional layers, we present an efficient pipeline exploiting multi-scale schemes to extract local features, in particular, by taking geometric invariance into explicit account, i.e. positions, scales and spatial consistency. In our experiments using five standard image retrieval datasets, we demonstrate that generic ConvNet image representations can outperform other state-of-the-art methods if they are extracted appropriately.

Ort, förlag, år, upplaga, sidor
Institute of Image Information and Television Engineers , 2016. Vol. 4, nr 3, s. 251-258
Nyckelord [en]
Convolutional network, Learning representation, Multi-resolution search, Visual instance retrieval
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
URN: urn:nbn:se:kth:diva-195472Scopus ID: 2-s2.0-84979503481OAI: oai:DiVA.org:kth-195472DiVA, id: diva2:1049836
Anmärkning

QC 20161125

Tillgänglig från: 2016-11-25 Skapad: 2016-11-03 Senast uppdaterad: 2018-01-13Bibliografiskt granskad
Ingår i avhandling
1. Convolutional Network Representation for Visual Recognition
Öppna denna publikation i ny flik eller fönster >>Convolutional Network Representation for Visual Recognition
2017 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Image representation is a key component in visual recognition systems. In visual recognition problem, the solution or the model should be able to learn and infer the quality of certain visual semantics in the image. Therefore, it is important for the model to represent the input image in a way that the semantics of interest can be inferred easily and reliably. This thesis is written in the form of a compilation of publications and tries to look into the Convolutional Networks (CovnNets) representation in visual recognition problems from an empirical perspective. Convolutional Network is a special class of Neural Networks with a hierarchical structure where every layer’s output (except for the last layer) will be the input of another one. It was shown that ConvNets are powerful tools to learn a generic representation of an image. In this body of work, we first showed that this is indeed the case and ConvNet representation with a simple classifier can outperform highly-tuned pipelines based on hand-crafted features. To be precise, we first trained a ConvNet on a large dataset, then for every image in another task with a small dataset, we feedforward the image to the ConvNet and take the ConvNets activation on a certain layer as the image representation. Transferring the knowledge from the large dataset (source task) to the small dataset (target task) proved to be effective and outperformed baselines on a variety of tasks in visual recognition. We also evaluated the presence of spatial visual semantics in ConvNet representation and observed that ConvNet retains significant spatial information despite the fact that it has never been explicitly trained to preserve low-level semantics. We then tried to investigate the factors that affect the transferability of these representations. We studied various factors on a diverse set of visual recognition tasks and found a consistent correlation between the effect of those factors and the similarity of the target task to the source task. This intuition alongside the experimental results provides a guideline to improve the performance of visual recognition tasks using ConvNet features. Finally, we addressed the task of visual instance retrieval specifically as an example of how these simple intuitions can increase the performance of the target task massively.

Ort, förlag, år, upplaga, sidor
KTH Royal Institute of Technology, 2017. s. 130
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 2017:01
Nyckelord
Convolutional Network, Visual Recognition, Transfer Learning
Nationell ämneskategori
Robotteknik och automation
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-197919 (URN)978-91-7729-213-5 (ISBN)
Disputation
2017-01-13, F3, Lindstedtsvagen 26, Stockholm, 10:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20161209

Tillgänglig från: 2016-12-09 Skapad: 2016-12-09 Senast uppdaterad: 2016-12-23Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Scopus

Personposter BETA

Razavian, Ali SharifMaki, Atsuto

Sök vidare i DiVA

Av författaren/redaktören
Razavian, Ali SharifSullivan, JosephineMaki, Atsuto
Av organisationen
Robotik, perception och lärande, RPLKTH
Datorseende och robotik (autonoma system)

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 193 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf