Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. (Computer Vision)ORCID-id: 0000-0001-5211-6388
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
2013 (engelsk)Artikkel, forskningsoversikt (Annet vitenskapelig) Epub ahead of print
Abstract [en]

Computer vision tasks are traditionally defined and eval-uated using semantic categories. However, it is known to thefield that semantic classes do not necessarily correspondto a unique visual class (e.g. inside and outside of a car).Furthermore, many of the feasible learning techniques athand cannot model a visual class which appears consistentto the human eye. These problems have motivated the useof 1) Unsupervised or supervised clustering as a prepro-cessing step to identify the visual subclasses to be used ina mixture-of-experts learning regime. 2) Felzenszwalb etal. part model and other works model mixture assignmentwith latent variables which is optimized during learning 3)Highly non-linear classifiers which are inherently capableof modelling multi-modal input space but are inefficient atthe test time. In this work, we promote an incremental viewover the recognition of semantic classes with varied appear-ances. We propose an optimization technique which incre-mentally finds maximal visual subclasses in a regularizedrisk minimization framework. Our proposed approach uni-fies the clustering and classification steps in a single algo-rithm. The importance of this approach is its compliancewith the classification via the fact that it does not need toknow about the number of clusters, the representation andsimilarity measures used in pre-processing clustering meth-ods a priori. Following this approach we show both quali-tatively and quantitatively significant results. We show thatthe visual subclasses demonstrate a long tail distribution.Finally, we show that state of the art object detection meth-ods (e.g. DPM) are unable to use the tails of this distri-bution comprising 50% of the training samples. In fact weshow that DPM performance slightly increases on averageby the removal of this half of the data.

sted, utgiver, år, opplag, sider
2013.
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
URN: urn:nbn:se:kth:diva-192293OAI: oai:DiVA.org:kth-192293DiVA, id: diva2:967491
Merknad

QC 20160912

Tilgjengelig fra: 2016-09-08 Laget: 2016-09-08 Sist oppdatert: 2016-09-12bibliografisk kontrollert
Inngår i avhandling
1. Visual Representations and Models: From Latent SVM to Deep Learning
Åpne denne publikasjonen i ny fane eller vindu >>Visual Representations and Models: From Latent SVM to Deep Learning
2016 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Two important components of a visual recognition system are representation and model. Both involves the selection and learning of the features that are indicative for recognition and discarding those features that are uninformative. This thesis, in its general form, proposes different techniques within the frameworks of two learning systems for representation and modeling. Namely, latent support vector machines (latent SVMs) and deep learning.

First, we propose various approaches to group the positive samples into clusters of visually similar instances. Given a fixed representation, the sampled space of the positive distribution is usually structured. The proposed clustering techniques include a novel similarity measure based on exemplar learning, an approach for using additional annotation, and augmenting latent SVM to automatically find clusters whose members can be reliably distinguished from background class. 

In another effort, a strongly supervised DPM is suggested to study how these models can benefit from privileged information. The extra information comes in the form of semantic parts annotation (i.e. their presence and location). And they are used to constrain DPMs latent variables during or prior to the optimization of the latent SVM. Its effectiveness is demonstrated on the task of animal detection.

Finally, we generalize the formulation of discriminative latent variable models, including DPMs, to incorporate new set of latent variables representing the structure or properties of negative samples. Thus, we term them as negative latent variables. We show this generalization affects state-of-the-art techniques and helps the visual recognition by explicitly searching for counter evidences of an object presence.

Following the resurgence of deep networks, in the last works of this thesis we have focused on deep learning in order to produce a generic representation for visual recognition. A Convolutional Network (ConvNet) is trained on a largely annotated image classification dataset called ImageNet with $\sim1.3$ million images. Then, the activations at each layer of the trained ConvNet can be treated as the representation of an input image. We show that such a representation is surprisingly effective for various recognition tasks, making it clearly superior to all the handcrafted features previously used in visual recognition (such as HOG in our first works on DPM). We further investigate the ways that one can improve this representation for a task in mind. We propose various factors involving before or after the training of the representation which can improve the efficacy of the ConvNet representation. These factors are analyzed on 16 datasets from various subfields of visual recognition.

sted, utgiver, år, opplag, sider
Stockholm, Sweden: KTH Royal Institute of Technology, 2016. s. 172
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 21
Emneord
Computer Vision, Machine Learning, Artificial Intelligence, Deep Learning, Learning Representation, Deformable Part Models, Discriminative Latent Variable Models, Convolutional Networks, Object Recognition, Object Detection
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-192289 (URN)978-91-7729-110-7 (ISBN)
Eksternt samarbeid:
Disputas
2016-09-27, Kollegiesalen, Brinellvägen 8, KTH-huset, våningsplan 4, KTH Campus, Stockholm, 15:26 (engelsk)
Opponent
Veileder
Merknad

QC 20160908

Tilgjengelig fra: 2016-09-08 Laget: 2016-09-08 Sist oppdatert: 2016-09-09bibliografisk kontrollert

Open Access i DiVA

fulltext(1736 kB)30 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 1736 kBChecksum SHA-512
9d46a41da06a32922cd305503c43b131399ff44db89f374d43eddcf91965d1365cea2be0841748486c42aeea90a79b9906806747da6eaa7f0c66609ba69d1dc2
Type fulltextMimetype application/pdf

Personposter BETA

Azizpour, Hossein

Søk i DiVA

Av forfatter/redaktør
Azizpour, HosseinCarlsson, Stefan
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 30 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 261 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf