Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Object detection using strongly-supervised deformable part models
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.ORCID-id: 0000-0001-5211-6388
2012 (Engelska)Ingår i: Computer Vision – ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part I / [ed] Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, Cordelia Schmid, Springer, 2012, nr PART 1, s. 836-849Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Deformable part-based models [1, 2] achieve state-of-the-art performance for object detection, but rely on heuristic initialization during training due to the optimization of non-convex cost function. This paper investigates limitations of such an initialization and extends earlier methods using additional supervision. We explore strong supervision in terms of annotated object parts and use it to (i) improve model initialization, (ii) optimize model structure, and (iii) handle partial occlusions. Our method is able to deal with sub-optimal and incomplete annotations of object parts and is shown to benefit from semi-supervised learning setups where part-level annotation is provided for a fraction of positive examples only. Experimental results are reported for the detection of six animal classes in PASCAL VOC 2007 and 2010 datasets. We demonstrate significant improvements in detection performance compared to the LSVM [1] and the Poselet [3] object detectors.

Ort, förlag, år, upplaga, sidor
Springer, 2012. nr PART 1, s. 836-849
Serie
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISSN 0302-9743 ; 7572 LNCS
Nyckelord [en]
Data sets, Detection performance, Model initialization, Nonconvex cost functions, Object Detection, Object detectors, Partial occlusions, Positive examples, Semi-supervised learning, State-of-the-art performance, Computer vision, Optimization, Supervised learning, Object recognition
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
URN: urn:nbn:se:kth:diva-107260DOI: 10.1007/978-3-642-33718-5_60ISI: 000343418300060Scopus ID: 2-s2.0-84867871564ISBN: 978-364233717-8 (tryckt)OAI: oai:DiVA.org:kth-107260DiVA, id: diva2:575473
Konferens
12th European Conference on Computer Vision, ECCV 2012, 7 October 2012 through 13 October 2012, Florence
Anmärkning

QC 20121210

Tillgänglig från: 2012-12-10 Skapad: 2012-12-10 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
Ingår i avhandling
1. Visual Representations and Models: From Latent SVM to Deep Learning
Öppna denna publikation i ny flik eller fönster >>Visual Representations and Models: From Latent SVM to Deep Learning
2016 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Two important components of a visual recognition system are representation and model. Both involves the selection and learning of the features that are indicative for recognition and discarding those features that are uninformative. This thesis, in its general form, proposes different techniques within the frameworks of two learning systems for representation and modeling. Namely, latent support vector machines (latent SVMs) and deep learning.

First, we propose various approaches to group the positive samples into clusters of visually similar instances. Given a fixed representation, the sampled space of the positive distribution is usually structured. The proposed clustering techniques include a novel similarity measure based on exemplar learning, an approach for using additional annotation, and augmenting latent SVM to automatically find clusters whose members can be reliably distinguished from background class. 

In another effort, a strongly supervised DPM is suggested to study how these models can benefit from privileged information. The extra information comes in the form of semantic parts annotation (i.e. their presence and location). And they are used to constrain DPMs latent variables during or prior to the optimization of the latent SVM. Its effectiveness is demonstrated on the task of animal detection.

Finally, we generalize the formulation of discriminative latent variable models, including DPMs, to incorporate new set of latent variables representing the structure or properties of negative samples. Thus, we term them as negative latent variables. We show this generalization affects state-of-the-art techniques and helps the visual recognition by explicitly searching for counter evidences of an object presence.

Following the resurgence of deep networks, in the last works of this thesis we have focused on deep learning in order to produce a generic representation for visual recognition. A Convolutional Network (ConvNet) is trained on a largely annotated image classification dataset called ImageNet with $\sim1.3$ million images. Then, the activations at each layer of the trained ConvNet can be treated as the representation of an input image. We show that such a representation is surprisingly effective for various recognition tasks, making it clearly superior to all the handcrafted features previously used in visual recognition (such as HOG in our first works on DPM). We further investigate the ways that one can improve this representation for a task in mind. We propose various factors involving before or after the training of the representation which can improve the efficacy of the ConvNet representation. These factors are analyzed on 16 datasets from various subfields of visual recognition.

Ort, förlag, år, upplaga, sidor
Stockholm, Sweden: KTH Royal Institute of Technology, 2016. s. 172
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 21
Nyckelord
Computer Vision, Machine Learning, Artificial Intelligence, Deep Learning, Learning Representation, Deformable Part Models, Discriminative Latent Variable Models, Convolutional Networks, Object Recognition, Object Detection
Nationell ämneskategori
Elektroteknik och elektronik Datorsystem
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-192289 (URN)978-91-7729-110-7 (ISBN)
Externt samarbete:
Disputation
2016-09-27, Kollegiesalen, Brinellvägen 8, KTH-huset, våningsplan 4, KTH Campus, Stockholm, 15:26 (Engelska)
Opponent
Handledare
Anmärkning

QC 20160908

Tillgänglig från: 2016-09-08 Skapad: 2016-09-08 Senast uppdaterad: 2016-09-09Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Azizpour, Hossein

Sök vidare i DiVA

Av författaren/redaktören
Azizpour, Hossein
Av organisationen
Datorseende och robotik, CVAP
Datorseende och robotik (autonoma system)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 483 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf