Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Persistent Evidence of Local Image Properties in Generic ConvNets
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. (Computer Vision)ORCID-id: 0000-0001-5211-6388
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.ORCID-id: 0000-0002-4266-6746
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
Vise andre og tillknytning
2015 (engelsk)Inngår i: Image Analysis: 19th Scandinavian Conference, SCIA 2015, Copenhagen, Denmark, June 15-17, 2015. Proceedings / [ed] Paulsen, Rasmus R., Pedersen, Kim S., Springer Publishing Company, 2015, s. 249-262Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or thevariation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. Surprisingly, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer,i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks,and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes.

sted, utgiver, år, opplag, sider
Springer Publishing Company, 2015. s. 249-262
Serie
Image Processing, Computer Vision, Pattern Recognition, and Graphics ; 9127
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-172140DOI: 10.1007/978-3-319-19665-7_21Scopus ID: 2-s2.0-84947982864OAI: oai:DiVA.org:kth-172140DiVA, id: diva2:845957
Konferanse
Scandinavian Conference on Image Analysis, Copenhagen, Denmark, 15-17 June, 2015
Merknad

Qc 20150828

Tilgjengelig fra: 2015-08-13 Laget: 2015-08-13 Sist oppdatert: 2016-12-09bibliografisk kontrollert
Inngår i avhandling
1. Convolutional Network Representation for Visual Recognition
Åpne denne publikasjonen i ny fane eller vindu >>Convolutional Network Representation for Visual Recognition
2017 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Image representation is a key component in visual recognition systems. In visual recognition problem, the solution or the model should be able to learn and infer the quality of certain visual semantics in the image. Therefore, it is important for the model to represent the input image in a way that the semantics of interest can be inferred easily and reliably. This thesis is written in the form of a compilation of publications and tries to look into the Convolutional Networks (CovnNets) representation in visual recognition problems from an empirical perspective. Convolutional Network is a special class of Neural Networks with a hierarchical structure where every layer’s output (except for the last layer) will be the input of another one. It was shown that ConvNets are powerful tools to learn a generic representation of an image. In this body of work, we first showed that this is indeed the case and ConvNet representation with a simple classifier can outperform highly-tuned pipelines based on hand-crafted features. To be precise, we first trained a ConvNet on a large dataset, then for every image in another task with a small dataset, we feedforward the image to the ConvNet and take the ConvNets activation on a certain layer as the image representation. Transferring the knowledge from the large dataset (source task) to the small dataset (target task) proved to be effective and outperformed baselines on a variety of tasks in visual recognition. We also evaluated the presence of spatial visual semantics in ConvNet representation and observed that ConvNet retains significant spatial information despite the fact that it has never been explicitly trained to preserve low-level semantics. We then tried to investigate the factors that affect the transferability of these representations. We studied various factors on a diverse set of visual recognition tasks and found a consistent correlation between the effect of those factors and the similarity of the target task to the source task. This intuition alongside the experimental results provides a guideline to improve the performance of visual recognition tasks using ConvNet features. Finally, we addressed the task of visual instance retrieval specifically as an example of how these simple intuitions can increase the performance of the target task massively.

sted, utgiver, år, opplag, sider
KTH Royal Institute of Technology, 2017. s. 130
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 2017:01
Emneord
Convolutional Network, Visual Recognition, Transfer Learning
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-197919 (URN)978-91-7729-213-5 (ISBN)
Disputas
2017-01-13, F3, Lindstedtsvagen 26, Stockholm, 10:00 (engelsk)
Opponent
Veileder
Merknad

QC 20161209

Tilgjengelig fra: 2016-12-09 Laget: 2016-12-09 Sist oppdatert: 2016-12-23bibliografisk kontrollert

Open Access i DiVA

fulltext(15069 kB)124 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 15069 kBChecksum SHA-512
675f90d2f7cc938800825e9499e599dc24a07866f27e0ff26d147e07ca3251b3a65dacc22e273264df3bac7e4fad26ae5716d9cacf4043f3827e59e399599dfe
Type fulltextMimetype application/pdf

Andre lenker

Forlagets fulltekstScopusConference websiteThe final publication is available at www.springerlink.com

Personposter BETA

Azizpour, HosseinMaki, Atsuto

Søk i DiVA

Av forfatter/redaktør
Sharif Razavian, AliAzizpour, HosseinMaki, AtsutoSullivan, JosephineEk, Carl HenrikCarlsson, Stefan
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 124 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 206 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf