Composed Complex-Cue Histograms: An Investigation of the Information Content in Receptive Field Based Image Descriptors for Object Recognition
2012 (English)In: Computer Vision and Image Understanding, ISSN 1077-3142, E-ISSN 1090-235X, Vol. 116, no 4, 538-560 p.Article in journal (Refereed) Published
Recent work has shown that effective methods for recognizing objects and spatio-temporal events can be constructed based on histograms of receptive field like image operations.
This paper presents the results of an extensive study of the performance of different types of receptive field like image descriptors for histogram-based object recognition, based on different combinations of image cues in terms of Gaussian derivatives or differential invariants applied to either intensity information, colour-opponent channels or both. A rich set of composed complex-cue image descriptors is introduced and evaluated with respect to the problems of (i) recognizing previously seen object instances from previously unseen views, and (ii) classifying previously unseen objects into visual categories.
It is shown that there exist novel histogram descriptors with significantly better recognition performance compared to previously used histogram features within the same class. Specifically, the experiments show that it is possible to obtain more discriminative features by combining lower-dimensional scale-space features into composed complex-cue histograms. Furthermore, different types of image descriptors have different relative advantages with respect to the problems of object instance recognition vs. object category classification. These conclusions are obtained from extensive experimental evaluations on two mutually independent data sets.
For the task of recognizing specific object instances, combined histograms of spatial and spatio-chromatic derivatives are highly discriminative, and several image descriptors in terms rotationally invariant (intensity and spatio-chromatic) differential invariants up to order two lead to very high recognition rates.
For the task of category classification, primary information is contained in both first- and second-order derivatives, where second-order partial derivatives constitute the most discriminative cue.
Dimensionality reduction by principal component analysis and variance normalization prior to training and recognition can in many cases lead to a significant increase in recognition or classification performance. Surprisingly high recognition rates can even be obtained with binary histograms that reveal the polarity of local scale-space features, and which can be expected to be particularly robust to illumination variations.
An overall conclusion from this study is that compared to previously used lower-dimensional histograms, the use of composed complex-cue histograms of higher dimensionality reveals the co-variation of multiple cues and enables much better recognition performance, both with regard to the problems of recognizing previously seen objects from novel views and for classifying previously unseen objects into visual categories.
Place, publisher, year, edition, pages
Elsevier, 2012. Vol. 116, no 4, 538-560 p.
image descriptor, histogram, object recognition, object categorization, Gaussian derivative, spatio-chromatic derivative, differential invariant, spatio-chromatic differential invariant, image feature, colour feature, scale-space, cue combination, multiple cues, multi-scale representation, computer vision
Engineering and Technology Computer Science Computer Vision and Robotics (Autonomous Systems) Bioinformatics (Computational Biology)
IdentifiersURN: urn:nbn:se:kth:diva-52279DOI: 10.1016/j.cviu.2011.12.003ISI: 000300481700006ScopusID: 2-s2.0-84856279703OAI: oai:DiVA.org:kth-52279DiVA: diva2:465703
ProjectsImage Descriptors and Scale-Space Theory for Spatial and Spatio-Temporal Recognition
FunderSwedish Research Council, 2004-4680Swedish Research Council, 2010-4766Knut and Alice Wallenberg Foundation
This is the author’s version of a work that was accepted for publication in Computer Vision and Image Understanding. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computer Vision and Image Understanding, VOL 116, ISSUE 4, DOI 10.1016/j.cviu.2011.12.003. QC 201204022012-04-102011-12-152012-04-16Bibliographically approved