Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Convolutional Network Representation for Visual Recognition
KTH, Skolan för datavetenskap och kommunikation (CSC), Robotik, perception och lärande, RPL.
2017 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Image representation is a key component in visual recognition systems. In visual recognition problem, the solution or the model should be able to learn and infer the quality of certain visual semantics in the image. Therefore, it is important for the model to represent the input image in a way that the semantics of interest can be inferred easily and reliably. This thesis is written in the form of a compilation of publications and tries to look into the Convolutional Networks (CovnNets) representation in visual recognition problems from an empirical perspective. Convolutional Network is a special class of Neural Networks with a hierarchical structure where every layer’s output (except for the last layer) will be the input of another one. It was shown that ConvNets are powerful tools to learn a generic representation of an image. In this body of work, we first showed that this is indeed the case and ConvNet representation with a simple classifier can outperform highly-tuned pipelines based on hand-crafted features. To be precise, we first trained a ConvNet on a large dataset, then for every image in another task with a small dataset, we feedforward the image to the ConvNet and take the ConvNets activation on a certain layer as the image representation. Transferring the knowledge from the large dataset (source task) to the small dataset (target task) proved to be effective and outperformed baselines on a variety of tasks in visual recognition. We also evaluated the presence of spatial visual semantics in ConvNet representation and observed that ConvNet retains significant spatial information despite the fact that it has never been explicitly trained to preserve low-level semantics. We then tried to investigate the factors that affect the transferability of these representations. We studied various factors on a diverse set of visual recognition tasks and found a consistent correlation between the effect of those factors and the similarity of the target task to the source task. This intuition alongside the experimental results provides a guideline to improve the performance of visual recognition tasks using ConvNet features. Finally, we addressed the task of visual instance retrieval specifically as an example of how these simple intuitions can increase the performance of the target task massively.

Ort, förlag, år, upplaga, sidor
KTH Royal Institute of Technology, 2017. , s. 130
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 2017:01
Nyckelord [en]
Convolutional Network, Visual Recognition, Transfer Learning
Nationell ämneskategori
Robotteknik och automation
Forskningsämne
Datalogi
Identifikatorer
URN: urn:nbn:se:kth:diva-197919ISBN: 978-91-7729-213-5 (tryckt)OAI: oai:DiVA.org:kth-197919DiVA, id: diva2:1054887
Disputation
2017-01-13, F3, Lindstedtsvagen 26, Stockholm, 10:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20161209

Tillgänglig från: 2016-12-09 Skapad: 2016-12-09 Senast uppdaterad: 2016-12-23Bibliografiskt granskad
Delarbeten
1. CNN features off-the-shelf: An Astounding Baseline for Recognition
Öppna denna publikation i ny flik eller fönster >>CNN features off-the-shelf: An Astounding Baseline for Recognition
2014 (Engelska)Ingår i: Proceedings of CVPR 2014, 2014Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Recent results indicate that the generic descriptors extracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13. We use features extracted from the OverFeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. We selected these tasks and datasets as they gradually move further away from the original task and data the OverFeat network was trained to solve. Astonishingly, we report consistent superior results compared to the highly tuned state-of-the-art systems in all the visual classification tasks on various datasets. For instance retrieval it consistently outperforms low memory footprint methods except for sculptures dataset. The results are achieved using a linear SVM classifier (or L2 distance in case of retrieval) applied to a feature representation of size 4096 extracted from a layer in the net. The representations are further modified using simple augmentation techniques e.g. jittering. The results strongly suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.

Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:kth:diva-149178 (URN)10.1109/CVPRW.2014.131 (DOI)000349552300079 ()2-s2.0-84908537903 (Scopus ID)
Konferens
Computer Vision and Pattern Recognition (CVPR) 2014, DeepVision workshop,June 28, 2014, Columbus, Ohio
Anmärkning

Best Paper Runner-up Award.

QC 20140825

Tillgänglig från: 2014-08-16 Skapad: 2014-08-16 Senast uppdaterad: 2018-01-11Bibliografiskt granskad
2. Persistent Evidence of Local Image Properties in Generic ConvNets
Öppna denna publikation i ny flik eller fönster >>Persistent Evidence of Local Image Properties in Generic ConvNets
Visa övriga...
2015 (Engelska)Ingår i: Image Analysis: 19th Scandinavian Conference, SCIA 2015, Copenhagen, Denmark, June 15-17, 2015. Proceedings / [ed] Paulsen, Rasmus R., Pedersen, Kim S., Springer Publishing Company, 2015, s. 249-262Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or thevariation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. Surprisingly, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer,i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks,and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes.

Ort, förlag, år, upplaga, sidor
Springer Publishing Company, 2015
Serie
Image Processing, Computer Vision, Pattern Recognition, and Graphics ; 9127
Nationell ämneskategori
Datorsystem
Identifikatorer
urn:nbn:se:kth:diva-172140 (URN)10.1007/978-3-319-19665-7_21 (DOI)2-s2.0-84947982864 (Scopus ID)
Konferens
Scandinavian Conference on Image Analysis, Copenhagen, Denmark, 15-17 June, 2015
Anmärkning

Qc 20150828

Tillgänglig från: 2015-08-13 Skapad: 2015-08-13 Senast uppdaterad: 2016-12-09Bibliografiskt granskad
3. Factors of Transferability for a Generic ConvNet Representation
Öppna denna publikation i ny flik eller fönster >>Factors of Transferability for a Generic ConvNet Representation
Visa övriga...
2016 (Engelska)Ingår i: IEEE Transaction on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 38, nr 9, s. 1790-1802, artikel-id 7328311Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed.

Ort, förlag, år, upplaga, sidor
IEEE Computer Society Digital Library, 2016
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-177033 (URN)10.1109/TPAMI.2015.2500224 (DOI)000381432700006 ()2-s2.0-84981266620 (Scopus ID)
Anmärkning

QC 20161208

Tillgänglig från: 2015-11-13 Skapad: 2015-11-13 Senast uppdaterad: 2018-01-10Bibliografiskt granskad
4. Visual instance retrieval with deep convolutional networks
Öppna denna publikation i ny flik eller fönster >>Visual instance retrieval with deep convolutional networks
2016 (Engelska)Ingår i: ITE Transactions on Media Technology and Applications, ISSN 2186-7364, Vol. 4, nr 3, s. 251-258Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

This paper provides an extensive study on the availability of image representations based on convolutional networks (ConvNets) for the task of visual instance retrieval. Besides the choice of convolutional layers, we present an efficient pipeline exploiting multi-scale schemes to extract local features, in particular, by taking geometric invariance into explicit account, i.e. positions, scales and spatial consistency. In our experiments using five standard image retrieval datasets, we demonstrate that generic ConvNet image representations can outperform other state-of-the-art methods if they are extracted appropriately.

Ort, förlag, år, upplaga, sidor
Institute of Image Information and Television Engineers, 2016
Nyckelord
Convolutional network, Learning representation, Multi-resolution search, Visual instance retrieval
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
urn:nbn:se:kth:diva-195472 (URN)10.3169/mta.4.251 (DOI)2-s2.0-84979503481 (Scopus ID)
Anmärkning

QC 20161125

Tillgänglig från: 2016-11-25 Skapad: 2016-11-03 Senast uppdaterad: 2020-03-05Bibliografiskt granskad

Open Access i DiVA

fulltext(2157 kB)347 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 2157 kBChecksumma SHA-512
dba5343d1d3369ce520e55ac7e37144e38e88c632a68a0fbf729d3b952af537cedf47a7ccec6baddd871fdcdb424de6cd0c4877adeae8337b3d2eb0f262615e6
Typ fulltextMimetyp application/pdf

Sök vidare i DiVA

Av författaren/redaktören
Sharif Razavian, Ali
Av organisationen
Robotik, perception och lärande, RPL
Robotteknik och automation

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 347 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 1998 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf