Change search
ReferencesLink to record
Permanent link

Direct link
Visual Attention in Active Vision Systems: Attending, Classifying and Manipulating Objects
KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
2011 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

This thesis has presented a computational model for the combination of bottom-up and top-down attentional mechanisms. Furthermore, the use for this model has been demonstrated in a variety of applications of machine and robotic vision. We have observed that an attentional mechanism is imperative in any active vision system, machine as well as biological, since it not only reduces the amount of information that needs to be further processed (for say recognition, action), but also by only processing the attended image regions, such tasks become more robust to large amounts of clutter and noise in the visual field.

Using various feature channels such as color, orientation, texture, depth and symmetry, as input, the presented model is able with a pre-trained artificial neural network to modulate a saliency map for a particular top-down goal, e.g. visual search for a target object. More specifically it dynamically combines the unmodulated bottom-up saliency with the modulated top-down saliency, by means of a biologically and psychophysically motivated temporal differential equation. This way the system is for instance able to detect important bottom-up cues, even while in visual search mode (top-down) for a particular object.

All the computational steps for yielding the final attentional map, that ranks regions in images according to their importance for the system, are shown to be biologically plausible. It has also been demonstrated that the presented attentional model facilitates tasks other than visual search. For instance, using the covert attentional peaks that the model returns, we can improve scene understanding and segmentation through clustering or scattering of the 2D/3D components of the scene, depending on the configuration of these attentional peaks and their relations to other attributes of the scene. More specifically this is performed by means of entropy optimization of the scence under varying cluster-configurations, i.e. different groupings of the various components of the scene.

Qualitative experiments demonstrated the use of this attentional model on a robotic humanoid platform and in a real-time manner control the overt attention of the robot by specifying the saccadic movements of the robot head. These experiments also exposed another highly important aspect of the model; its temporal variability, as opposed to many other attentional (saliency) models that exclusively deal with static images. Here the dynamic aspects of the attentional mechanism proved to allow for a temporally varying trade-off between top-down and bottom-up influences depending on changes in the environment of the robot.

The thesis has also lay forward systematic and quantitative large scale experiments on the actual benefits and uses of this kind of attentional model. To this end a simulated 2D environment was implemented, where the system could not “see” the entire environment and needed to perform overt shifts of attention (a simulated saccade) in order to perfom a visual search task for a pre-defined sought object. This allowed for a simple and rapid substitution of the core attentional-model of the system with comparative computational models designed by other researchers. Nine such contending models were tested and compared with the presented model, in a quantitative manner. Given certain asumptions these experiments showed that the attentional model presented in this work outperforms the other models in simple visualsearch tasks.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2011. , xiii, 218 p.
Trita-CSC-A, ISSN 1653-5723 ; 2011:22
Keyword [en]
visual attention, saliency map, compter vision, robotics, active vision, machine learning
National Category
Computer Vision and Robotics (Autonomous Systems) Computer Science
URN: urn:nbn:se:kth:diva-53484ISBN: 978-91-8501-220-9OAI: diva2:470155
Public defence
2012-01-13, Sal F2, Lindstedtsvägen 28, KTH, Stockholm, 10:15 (English)
QC 20111228Available from: 2011-12-28 Created: 2011-12-28 Last updated: 2011-12-28Bibliographically approved

Open Access in DiVA

fulltext(19596 kB)494 downloads
File information
File name FULLTEXT01.pdfFile size 19596 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Rasolzadeh, Babak
By organisation
Computer Vision and Active Perception, CVAP
Computer Vision and Robotics (Autonomous Systems)Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 494 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 336 hits
ReferencesLink to record
Permanent link

Direct link