Change search
Refine search result
1234567 1 - 50 of 418
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1. Abbeloos, W.
    et al.
    Caccamo, Sergio
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Ataer-Cansizoglu, E.
    Taguchi, Y.
    Feng, C.
    Lee, T. -Y
    Detecting and Grouping Identical Objects for Region Proposal and Classification2017In: 2017 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, IEEE Computer Society, 2017, Vol. 2017, p. 501-502, article id 8014810Conference paper (Refereed)
    Abstract [en]

    Often multiple instances of an object occur in the same scene, for example in a warehouse. Unsupervised multi-instance object discovery algorithms are able to detect and identify such objects. We use such an algorithm to provide object proposals to a convolutional neural network (CNN) based classifier. This results in fewer regions to evaluate, compared to traditional region proposal algorithms. Additionally, it enables using the joint probability of multiple instances of an object, resulting in improved classification accuracy. The proposed technique can also split a single class into multiple sub-classes corresponding to the different object types, enabling hierarchical classification.

  • 2. Abeywardena, D.
    et al.
    Wang, Zhan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Dissanayake, G.
    Waslander, S. L.
    Kodagoda, S.
    Model-aided state estimation for quadrotor micro air vehicles amidst wind disturbances2014Conference paper (Refereed)
    Abstract [en]

    This paper extends the recently developed Model-Aided Visual-Inertial Fusion (MA-VIF) technique for quadrotor Micro Air Vehicles (MAV) to deal with wind disturbances. The wind effects are explicitly modelled in the quadrotor dynamic equations excluding the unobservable wind velocity component. This is achieved by a nonlinear observability of the dynamic system with wind effects. We show that using the developed model, the vehicle pose and two components of the wind velocity vector can be simultaneously estimated with a monocular camera and an inertial measurement unit. We also show that the MA-VIF is reasonably tolerant to wind disturbances, even without explicit modelling of wind effects and explain the reasons for this behaviour. Experimental results using a Vicon motion capture system are presented to demonstrate the effectiveness of the proposed method and validate our claims.

  • 3.
    Aghazadeh, Omid
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Data Driven Visual Recognition2014Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    This thesis is mostly about supervised visual recognition problems. Based on a general definition of categories, the contents are divided into two parts: one which models categories and one which is not category based. We are interested in data driven solutions for both kinds of problems.

    In the category-free part, we study novelty detection in temporal and spatial domains as a category-free recognition problem. Using data driven models, we demonstrate that based on a few reference exemplars, our methods are able to detect novelties in ego-motions of people, and changes in the static environments surrounding them.

    In the category level part, we study object recognition. We consider both object category classification and localization, and propose scalable data driven approaches for both problems. A mixture of parametric classifiers, initialized with a sophisticated clustering of the training data, is demonstrated to adapt to the data better than various baselines such as the same model initialized with less subtly designed procedures. A nonparametric large margin classifier is introduced and demonstrated to have a multitude of advantages in comparison to its competitors: better training and testing time costs, the ability to make use of indefinite/invariant and deformable similarity measures, and adaptive complexity are the main features of the proposed model.

    We also propose a rather realistic model of recognition problems, which quantifies the interplay between representations, classifiers, and recognition performances. Based on data-describing measures which are aggregates of pairwise similarities of the training data, our model characterizes and describes the distributions of training exemplars. The measures are shown to capture many aspects of the difficulty of categorization problems and correlate significantly to the observed recognition performances. Utilizing these measures, the model predicts the performance of particular classifiers on distributions similar to the training data. These predictions, when compared to the test performance of the classifiers on the test sets, are reasonably accurate.

    We discuss various aspects of visual recognition problems: what is the interplay between representations and classification tasks, how can different models better adapt to the training data, etc. We describe and analyze the aforementioned methods that are designed to tackle different visual recognition problems, but share one common characteristic: being data driven.

  • 4.
    Aghazadeh, Omid
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Azizpour, Hossein
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Mixture component identification and learning for visual recognition2012In: Computer Vision – ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI, Springer, 2012, p. 115-128Conference paper (Refereed)
    Abstract [en]

    The non-linear decision boundary between object and background classes - due to large intra-class variations - needs to be modelled by any classifier wishing to achieve good results. While a mixture of linear classifiers is capable of modelling this non-linearity, learning this mixture from weakly annotated data is non-trivial and is the paper's focus. Our approach is to identify the modes in the distribution of our positive examples by clustering, and to utilize this clustering in a latent SVM formulation to learn the mixture model. The clustering relies on a robust measure of visual similarity which suppresses uninformative clutter by using a novel representation based on the exemplar SVM. This subtle clustering of the data leads to learning better mixture models, as is demonstrated via extensive evaluations on Pascal VOC 2007. The final classifier, using a HOG representation of the global image patch, achieves performance comparable to the state-of-the-art while being more efficient at detection time.

  • 5.
    Aghazadeh, Omid
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Large Scale, Large Margin Classification using Indefinite Similarity MeasurensManuscript (preprint) (Other academic)
  • 6.
    Aghazadeh, Omid
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Properties of Datasets Predict the Performance of Classifiers2013In: BMVC 2013 - Electronic Proceedings of the British Machine Vision Conference 2013, British Machine Vision Association, BMVA , 2013Conference paper (Refereed)
    Abstract [en]

    It has been shown that the performance of classifiers depends not only on the number of training samples, but also on the quality of the training set [10, 12]. The purpose of this paper is to 1) provide quantitative measures that determine the quality of the training set and 2) provide the relation between the test performance and the proposed measures. The measures are derived from pairwise affinities between training exemplars of the positive class and they have a generative nature. We show that the performance of the state of the art methods, on the test set, can be reasonably predicted based on the values of the proposed measures on the training set. These measures open up a wide range of applications to the recognition community enabling us to analyze the behavior of the learning algorithms w.r.t the properties of the training data. This will in turn enable us to devise rules for the automatic selection of training data that maximize the quantified quality of the training set and thereby improve recognition performance.

  • 7.
    Aghazadeh, Omid
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Properties of Datasets Predict the Performance of Classifiers2013Manuscript (preprint) (Other academic)
  • 8.
    Aghazadeh, Omid
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Multi view registration for novelty/background separation2012In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE Computer Society, 2012, p. 757-764Conference paper (Refereed)
    Abstract [en]

    We propose a system for the automatic segmentation of novelties from the background in scenarios where multiple images of the same environment are available e.g. obtained by wearable visual cameras. Our method finds the pixels in a query image corresponding to the underlying background environment by comparing it to reference images of the same scene. This is achieved despite the fact that all the images may have different viewpoints, significantly different illumination conditions and contain different objects cars, people, bicycles, etc. occluding the background. We estimate the probability of each pixel, in the query image, belonging to the background by computing its appearance inconsistency to the multiple reference images. We then, produce multiple segmentations of the query image using an iterated graph cuts algorithm, initializing from these estimated probabilities and consecutively combine these segmentations to come up with a final segmentation of the background. Detection of the background in turn highlights the novel pixels. We demonstrate the effectiveness of our approach on a challenging outdoors data set.

  • 9.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    O'Sullivan, Carol
    Neff, Michael
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Mimebot—Investigating the Expressibility of Non-Verbal Communication Across Agent Embodiments2017In: ACM Transactions on Applied Perception, ISSN 1544-3558, E-ISSN 1544-3965, Vol. 14, no 4, article id 24Article in journal (Refereed)
    Abstract [en]

    Unlike their human counterparts, artificial agents such as robots and game characters may be deployed with a large variety of face and body configurations. Some have articulated bodies but lack facial features, and others may be talking heads ending at the neck. Generally, they have many fewer degrees of freedom than humans through which they must express themselves, and there will inevitably be a filtering effect when mapping human motion onto the agent. In this article, we investigate filtering effects on three types of embodiments: (a) an agent with a body but no facial features, (b) an agent with a head only, and (c) an agent with a body and a face. We performed a full performance capture of a mime actor enacting short interactions varying the non-verbal expression along five dimensions (e.g., level of frustration and level of certainty) for each of the three embodiments. We performed a crowd-sourced evaluation experiment comparing the video of the actor to the video of an animated robot for the different embodiments and dimensions. Our findings suggest that the face is especially important to pinpoint emotional reactions but is also most volatile to filtering effects. The body motion, on the other hand, had more diverse interpretations but tended to preserve the interpretation after mapping and thus proved to be more resilient to filtering.

  • 10. Almansa, A.
    et al.
    Lindeberg, Tony
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Fingerprint enhancement by shape adaptation of scale-space operators with automatic scale selection2000In: IEEE Transactions on Image Processing, ISSN 1057-7149, E-ISSN 1941-0042, Vol. 9, no 12, p. 2027-2042Article in journal (Refereed)
    Abstract [en]

    This work presents two mechanisms for processing fingerprint images; shape-adapted smoothing based on second moment descriptors and automatic scale selection based on normalized derivatives. The shape adaptation procedure adapts the smoothing operation to the local ridge structures, which allows interrupted ridges to be joined without destroying essential singularities such as branching points and enforces continuity of their directional fields. The Scale selection procedure estimates local ridge width and adapts the amount of smoothing to the local amount of noise. In addition, a ridgeness measure is defined, which reflects how well the local image structure agrees with a qualitative ridge model, and is used for spreading the results of shape adaptation into noisy areas. The combined approach makes it possible to resolve fine scale structures in clear areas while reducing the risk of enhancing noise in blurred or fragmented areas. The result is a reliable and adaptively detailed estimate of the ridge orientation field and ridge width, as well as a Smoothed grey-level version of the input image. We propose that these general techniques should be of interest to developers of automatic fingerprint identification systems as well as in other applications of processing related types of imagery.

  • 11. Almansa, Andrés
    et al.
    Lindeberg, Tony
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Enhancement of Fingerprint Images by Shape-Adapted Scale-Space Operators1996In: Gaussian Scale-Space Theory. Part I: Proceedings of PhD School on Scale-Space Theory (Copenhagen, Denmark) May 1996 / [ed] J. Sporring, M. Nielsen, L. Florack, and P. Johansen, Springer Science+Business Media B.V., 1996, p. 21-30Chapter in book (Refereed)
    Abstract [en]

    This work presents a novel technique for preprocessing fingerprint images. The method is based on the measurements of second moment descriptors and shape adaptation of scale-space operators with automatic scale selection (Lindeberg 1994). This procedure, which has been successfully used in the context of shape-from-texture and shape from disparity gradients, has several advantages when applied to fingerprint image enhancement, as observed by (Weickert 1995). For example, it is capable of joining interrupted ridges, and enforces continuity of their directional fields.

    In this work, these abovementioned general ideas are applied and extended in the following ways: Two methods for estimating local ridge width are explored and tuned to the problem of fingerprint enhancement. A ridgeness measure is defined, which reflects how well the local image structure agrees with a qualitative ridge model. This information is used for guiding a scale-selection mechanism, and for spreading the results of shape adaptation into noisy areas.

    The combined approach makes it possible to resolve fine scale structures in clear areas while reducing the risk of enhancing noise in blurred or fragmented areas. To a large extent, the scheme has the desirable property of joining interrupted lines without destroying essential singularities such as branching points. Thus, the result is a reliable and adaptively detailed estimate of the ridge orientation field and ridge width, as well as a smoothed grey-level version of the input image.

    A detailed experimental evaluation is presented, including a comparison with other techniques. We propose that the techniques presented provide mechanisms of interest to developers of automatic fingerprint identification systems.

  • 12.
    Ambrus, Rares
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS. KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Unsupervised construction of 4D semantic maps in a long-term autonomy scenario2017Doctoral thesis, monograph (Other academic)
    Abstract [en]

    Robots are operating for longer times and collecting much more data than just a few years ago. In this setting we are interested in exploring ways of modeling the environment, segmenting out areas of interest and keeping track of the segmentations over time, with the purpose of building 4D models (i.e. space and time) of the relevant parts of the environment.

    Our approach relies on repeatedly observing the environment and creating local maps at specific locations. The first question we address is how to choose where to build these local maps. Traditionally, an operator defines a set of waypoints on a pre-built map of the environment which the robot visits autonomously. Instead, we propose a method to automatically extract semantically meaningful regions from a point cloud representation of the environment. The resulting segmentation is purely geometric, and in the context of mobile robots operating in human environments, the semantic label associated with each segment (i.e. kitchen, office) can be of interest for a variety of applications. We therefore also look at how to obtain per-pixel semantic labels given the geometric segmentation, by fusing probabilistic distributions over scene and object types in a Conditional Random Field.

    For most robotic systems, the elements of interest in the environment are the ones which exhibit some dynamic properties (such as people, chairs, cups, etc.), and the ability to detect and segment such elements provides a very useful initial segmentation of the scene. We propose a method to iteratively build a static map from observations of the same scene acquired at different points in time. Dynamic elements are obtained by computing the difference between the static map and new observations. We address the problem of clustering together dynamic elements which correspond to the same physical object, observed at different points in time and in significantly different circumstances. To address some of the inherent limitations in the sensors used, we autonomously plan, navigate around and obtain additional views of the segmented dynamic elements. We look at methods of fusing the additional data and we show that both a combined point cloud model and a fused mesh representation can be used to more robustly recognize the dynamic object in future observations. In the case of the mesh representation, we also show how a Convolutional Neural Network can be trained for recognition by using mesh renderings.

    Finally, we present a number of methods to analyse the data acquired by the mobile robot autonomously and over extended time periods. First, we look at how the dynamic segmentations can be used to derive a probabilistic prior which can be used in the mapping process to further improve and reinforce the segmentation accuracy. We also investigate how to leverage spatial-temporal constraints in order to cluster dynamic elements observed at different points in time and under different circumstances. We show that by making a few simple assumptions we can increase the clustering accuracy even when the object appearance varies significantly between observations. The result of the clustering is a spatial-temporal footprint of the dynamic object, defining an area where the object is likely to be observed spatially as well as a set of time stamps corresponding to when the object was previously observed. Using this data, predictive models can be created and used to infer future times when the object is more likely to be observed. In an object search scenario, this model can be used to decrease the search time when looking for specific objects.

  • 13.
    Ambrus, Rares
    et al.
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Bore, Nils
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Folkesson, John
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS. KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Jensfelt, Patric
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS. KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Autonomous meshing, texturing and recognition of objectmodels with a mobile robot2017Conference paper (Refereed)
    Abstract [en]

    We present a system for creating object modelsfrom RGB-D views acquired autonomously by a mobile robot.We create high-quality textured meshes of the objects byapproximating the underlying geometry with a Poisson surface.Our system employs two optimization steps, first registering theviews spatially based on image features, and second aligningthe RGB images to maximize photometric consistency withrespect to the reconstructed mesh. We show that the resultingmodels can be used robustly for recognition by training aConvolutional Neural Network (CNN) on images rendered fromthe reconstructed meshes. We perform experiments on datacollected autonomously by a mobile robot both in controlledand uncontrolled scenarios. We compare quantitatively andqualitatively to previous work to validate our approach.

  • 14.
    Ambrus, Rares
    et al.
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Claici, Sebastian
    Wendt, Axel
    Automatic Room Segmentation From Unstructured 3-D Data of Indoor Environments2017In: IEEE Robotics and Automation Letters, ISSN 2377-3766, E-ISSN 1949-3045, Vol. 2, no 2, p. 749-756Article in journal (Refereed)
    Abstract [en]

    We present an automatic approach for the task of reconstructing a 2-D floor plan from unstructured point clouds of building interiors. Our approach emphasizes accurate and robust detection of building structural elements and, unlike previous approaches, does not require prior knowledge of scanning device poses. The reconstruction task is formulated as a multiclass labeling problem that we approach using energy minimization. We use intuitive priors to define the costs for the energy minimization problem and rely on accurate wall and opening detection algorithms to ensure robustness. We provide detailed experimental evaluation results, both qualitative and quantitative, against state-of-the-art methods and labeled ground-truth data.

  • 15.
    Ambrus, Rares
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ekekrantz, Johan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Folkesson, John
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Jensfelt, Patric
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Unsupervised learning of spatial-temporal models of objects in a long-term autonomy scenario2015In: 2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), IEEE , 2015, p. 5678-5685Conference paper (Refereed)
    Abstract [en]

    We present a novel method for clustering segmented dynamic parts of indoor RGB-D scenes across repeated observations by performing an analysis of their spatial-temporal distributions. We segment areas of interest in the scene using scene differencing for change detection. We extend the Meta-Room method and evaluate the performance on a complex dataset acquired autonomously by a mobile robot over a period of 30 days. We use an initial clustering method to group the segmented parts based on appearance and shape, and we further combine the clusters we obtain by analyzing their spatial-temporal behaviors. We show that using the spatial-temporal information further increases the matching accuracy.

  • 16.
    Ambrus, Rares
    et al.
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Folkesson, John
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Jensfelt, Patric
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Unsupervised object segmentation through change detection in a long term autonomy scenario2016In: IEEE-RAS International Conference on Humanoid Robots, IEEE, 2016, p. 1181-1187Conference paper (Refereed)
    Abstract [en]

    In this work we address the problem of dynamic object segmentation in office environments. We make no prior assumptions on what is dynamic and static, and our reasoning is based on change detection between sparse and non-uniform observations of the scene. We model the static part of the environment, and we focus on improving the accuracy and quality of the segmented dynamic objects over long periods of time. We address the issue of adapting the static structure over time and incorporating new elements, for which we train and use a classifier whose output gives an indication of the dynamic nature of the segmented elements. We show that the proposed algorithms improve the accuracy and the rate of detection of dynamic objects by comparing with a labelled dataset.

  • 17.
    Arnekvist, Isac
    KTH, School of Computer Science and Communication (CSC).
    Reinforcement learning for robotic manipulation2017Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Reinforcement learning was recently successfully used for real-world robotic manipulation tasks, without the need for human demonstration, usinga normalized advantage function-algorithm (NAF). Limitations on the shape of the advantage function however poses doubts to what kind of policies can be learned using this method. For similar tasks, convolutional neural networks have been used for pose estimation from images taken with fixed position cameras. For some applications however, this might not be a valid assumption. It was also shown that the quality of policies for robotic tasks severely deteriorates from small camera offsets. This thesis investigates the use of NAF for a pushing task with clear multimodal properties. The results are compared with using a deterministic policy with minimal constraints on the Q-function surface. Methods for pose estimation using convolutional neural networks are further investigated, especially with regards to randomly placed cameras with unknown offsets. By defining the coordinate frame of objects with respect to some visible feature, it is hypothesized that relative pose estimation can be accomplished even when the camera is not fixed and the offset is unknown. NAF is successfully implemented to solve a simple reaching task on a real robotic system where data collection is distributed over several robots, and learning is done on a separate server. Using NAF to learn a pushing task fails to converge to a good policy, both on the real robots and in simulation. Deep deterministic policy gradient (DDPG) is instead used in simulation and successfully learns to solve the task. The learned policy is then applied on the real robots and accomplishes to solve the task in the real setting as well. Pose estimation from fixed position camera images is learned and the policy is still able to solve the task using these estimates. By defining a coordinate frame from an object visible to the camera, in this case the robot arm, a neural network learns to regress the pushable objects pose in this frame without the assumption of a fixed camera. However, the precision of the predictions were too inaccurate to be used for solving the pushing task. Further modifications to this approach could however show to be a feasible solution to randomly placed cameras with unknown poses.

  • 18. Aslund, M.
    et al.
    Fredenberg, Erik
    KTH, School of Engineering Sciences (SCI), Physics.
    Telman, M.
    Danielsson, Mats
    KTH, School of Engineering Sciences (SCI), Physics.
    Detectors for the future of X-ray imaging2010In: Radiation Protection Dosimetry, ISSN 0144-8420, E-ISSN 1742-3406, Vol. 139, no 1-3, p. 327-333Article in journal (Refereed)
    Abstract [en]

    In recent decades, developments in detectors for X-ray imaging have improved dose efficiency. This has been accomplished with for example, structured scintillators such as columnar CsI, or with direct detectors where the X rays are converted to electric charge carriers in a semiconductor. Scattered radiation remains a major noise source, and fairly inefficient anti-scatter grids are still a gold standard. Hence, any future development should include improved scatter rejection. In recent years, photon-counting detectors have generated significant interest by several companies as well as academic research groups. This method eliminates electronic noise, which is an advantage in low-dose applications. Moreover, energy-sensitive photon-counting detectors allow for further improvements by optimising the signal-to-quantum-noise ratio, anatomical background subtraction or quantitative analysis of object constituents. This paper reviews state-of-the-art photon-counting detectors, scatter control and their application in diagnostic X-ray medical imaging. In particular, spectral imaging with photon-counting detectors, pitfalls such as charge sharing and high rates and various proposals for mitigation are discussed.

  • 19.
    Aviles, Marcos
    et al.
    GMV, Spain.
    Siozios, Kostas
    School of ECE, National Technical University of Athens, Greece.
    Diamantopoulos, Dionysios
    School of ECE, National Technical University of Athens, Greece.
    Nalpantidis, Lazaros
    Production and Management Engineering Dept., Democritus University of Thrace, Greece.
    Kostavelis, Ioannis
    Production and Management Engineering Dept., Democritus University of Thrace, Greece.
    Boukas, Evangelos
    Production and Management Engineering Dept., Democritus University of Thrace, Greece.
    Soudris, Dimitrios
    School of ECE, National Technical University of Athens, Greece.
    Gasteratos, Antonios
    Production and Management Engineering Dept., Democritus University of Thrace, Greece.
    A co-design methodology for implementing computer vision algorithms for rover navigation onto reconfigurable hardware2011In: Proceedings of the FPL2011 Workshop on Computer Vision on Low-Power Reconfigurable Architectures, 2011, p. 9-10Conference paper (Other academic)
    Abstract [en]

    Vision-based robotics applications have been widely studied in the last years. However, up to now solutions that have been proposed were affecting mostly software level. The SPARTAN project focuses in the tight and optimal implementation of computer vision algorithms targeting to rover navigation. For evaluation purposes, these algorithms will be implemented with a co-design methodology onto a Virtex-6 FPGA device.

  • 20.
    Azizpour, Hossein
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Laptev, I.
    Object detection using strongly-supervised deformable part models2012In: Computer Vision – ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part I / [ed] Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, Cordelia Schmid, Springer, 2012, no PART 1, p. 836-849Conference paper (Refereed)
    Abstract [en]

    Deformable part-based models [1, 2] achieve state-of-the-art performance for object detection, but rely on heuristic initialization during training due to the optimization of non-convex cost function. This paper investigates limitations of such an initialization and extends earlier methods using additional supervision. We explore strong supervision in terms of annotated object parts and use it to (i) improve model initialization, (ii) optimize model structure, and (iii) handle partial occlusions. Our method is able to deal with sub-optimal and incomplete annotations of object parts and is shown to benefit from semi-supervised learning setups where part-level annotation is provided for a fraction of positive examples only. Experimental results are reported for the detection of six animal classes in PASCAL VOC 2007 and 2010 datasets. We demonstrate significant improvements in detection performance compared to the LSVM [1] and the Poselet [3] object detectors.

  • 21.
    Azizpour, Hossein
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Razavian, Ali Sharif
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    From Generic to Specific Deep Representations for Visual Recognition2015In: Proceedings of CVPR 2015, IEEE conference proceedings, 2015Conference paper (Refereed)
    Abstract [en]

    Evidence is mounting that ConvNets are the best representation learning method for recognition. In the common scenario, a ConvNet is trained on a large labeled dataset and the feed-forward units activation, at a certain layer of the network, is used as a generic representation of an input image. Recent studies have shown this form of representation to be astoundingly effective for a wide range of recognition tasks. This paper thoroughly investigates the transferability of such representations w.r.t. several factors. It includes parameters for training the network such as its architecture and parameters of feature extraction. We further show that different visual recognition tasks can be categorically ordered based on their distance from the source task. We then show interesting results indicating a clear correlation between the performance of tasks and their distance from the source task conditioned on proposed factors. Furthermore, by optimizing these factors, we achieve stateof-the-art performances on 16 visual recognition tasks.

  • 22.
    Azizpour, Hossein
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sharif Razavian, Ali
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlssom, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Factors of Transferability for a Generic ConvNet Representation2016In: IEEE Transaction on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 38, no 9, p. 1790-1802, article id 7328311Article in journal (Refereed)
    Abstract [en]

    Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed.

  • 23.
    Baisero, Andrea
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Pokorny, Florian T.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    The Path Kernel2013In: ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods, 2013, p. 50-57Conference paper (Refereed)
    Abstract [en]

    Kernel methods have been used very successfully to classify data in various application domains. Traditionally, kernels have been constructed mainly for vectorial data defined on a specific vector space. Much less work has been addressing the development of kernel functions for non-vectorial data. In this paper, we present a new kernel for encoding sequential data. We present our results comparing the proposed kernel to the state of the art, showing a significant improvement in classification and a much improved robustness and interpretability.

  • 24. Barekatain, M.
    et al.
    Marti, Miquel
    KTH. Polytechnic University of Catalonia, Spain.
    Shih, H. -F
    Murray, Samuel
    KTH, School of Computer Science and Communication (CSC).
    Nakayama, K.
    Matsuo, Y.
    Prendinger, H.
    Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection2017In: 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017, IEEE Computer Society, 2017, Vol. 2017, p. 2153-2160Conference paper (Refereed)
    Abstract [en]

    Despite significant progress in the development of human action detection datasets and algorithms, no current dataset is representative of real-world aerial view scenarios. We present Okutama-Action, a new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing in current datasets, including dynamic transition of actions, significant changes in scale and aspect ratio, abrupt camera movement, as well as multi-labeled actors. As a result, our dataset is more challenging than existing ones, and will help push the field forward to enable real-world applications.

  • 25. Baroffio, L.
    et al.
    Cesana, M.
    Redondi, A.
    Tagliasacchi, M.
    Ascenso, J.
    Monteiro, P.
    Eriksson, Emil
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Dan, G.
    Fodor, Viktoria
    KTH, School of Electrical Engineering (EES), Communication Networks.
    GreenEyes: Networked energy-aware visual analysis2015In: 2015 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2015, IEEE conference proceedings, 2015Conference paper (Refereed)
    Abstract [en]

    The GreenEyes project aims at developing a comprehensive set of new methodologies, practical algorithms and protocols, to empower wireless sensor networks with vision capabilities. The key tenet of this research is that most visual analysis tasks can be carried out based on a succinct representation of the image, which entails both global and local features, while it disregards the underlying pixel-level representation. Specifically, GreenEyes will pursue the following goals: i) energy-constrained extraction of visual features; ii) rate-efficiency modelling and coding of visual feature; iii) networking streams of visual features. This will have a significant impact on several scenarios including, e.g., smart cities and environmental monitoring.

  • 26. Baudoin, Y.
    et al.
    Doroftei, D.
    De Cubber, G.
    Berrabah, S. A.
    Pinzon, C.
    Warlet, F.
    Gancet, J.
    Motard, E.
    Ilzkovitz, M.
    Nalpantidis, Lazaros
    Production and Management Engineering Dept., Democritus University of Thrace, Greece.
    Gasteratos, Antonios
    Production and Management Engineering Dept., Democritus University of Thrace, Greece.
    View-finder: Robotics assistance to fire-fighting services and crisis management2009In: Safety, Security & Rescue Robotics (SSRR), 2009 IEEE International Workshop on, 2009, p. 1-6Conference paper (Refereed)
    Abstract [en]

    In the event of an emergency due to a fire or other crisis, a necessary but time consuming pre-requisite, that could delay the real rescue operation, is to establish whether the ground or area can be entered safely by human emergency workers. The objective of the VIEW-FINDER project is to develop robots which have the primary task of gathering data. The robots are equipped with sensors that detect the presence of chemicals and, in parallel, image data is collected and forwarded to an advanced Control station (COC). The robots will be equipped with a wide array of chemical sensors, on-board cameras, Laser and other sensors to enhance scene understanding and reconstruction. At the Base Station (BS) the data is processed and combined with geographical information originating from a web of sources; thus providing the personnel leading the operation with in-situ processed data that can improve decision making. This paper will focus on the Crisis Management Information System that has been developed for improving a Disaster Management Action Plan and for linking the Control Station with a out-site Crisis Management Centre, and on the software tools implemented on the mobile robot gathering data in the outdoor area of the crisis.

  • 27.
    Bekiroglu, Yasemin
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Detry, Renaud
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Grasp Stability from Vision and Touch2012Conference paper (Refereed)
  • 28.
    Bergholm, Fredrik
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Adler, Jeremy
    Parmryd, Ingela
    Analysis of Bias in the Apparent Correlation Coefficient Between Image Pairs Corrupted by Severe Noise2010In: Journal of Mathematical Imaging and Vision, ISSN 0924-9907, E-ISSN 1573-7683, Vol. 37, no 3, p. 204-219Article in journal (Refereed)
    Abstract [en]

    The correlation coefficient r is a measure of similarity used to compare regions of interest in image pairs. In fluorescence microscopy there is a basic tradeoff between the degree of image noise and the frequency with which images can be acquired and therefore the ability to follow dynamic events. The correlation coefficient r is commonly used in fluorescence microscopy for colocalization measurements, when the relative distributions of two fluorophores are of interest. Unfortunately, r is known to be biased understating the true correlation when noise is present. A better measure of correlation is needed. This article analyses the expected value of r and comes up with a procedure for evaluating the bias of r, expected value formulas. A Taylor series of so-called invariant factors is analyzed in detail. These formulas indicate ways to correct r and thereby obtain a corrected value free from the influence of noise that is on average accurate (unbiased). One possible correction is the attenuated corrected correlation coefficient R, introduced heuristically by Spearman (in Am. J. Psychol. 15:72-101, 1904). An ideal correction formula in terms of expected values is derived. For large samples R tends towards the ideal correction formula and the true noise-free correlation. Correlation measurements using simulation based on the types of noise found in fluorescence microscopy images illustrate both the power of the method and the variance of R. We conclude that the correction formula is valid and is particularly useful for making correct analyses from very noisy datasets.

  • 29.
    Bergström, Niklas
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Interactive Perception: From Scenes to Objects2012Doctoral thesis, monograph (Other academic)
    Abstract [en]

    This thesis builds on the observation that robots, like humans, do not have enough experience to handle all situations from the start. Therefore they need tools to cope with new situations, unknown scenes and unknown objects. In particular, this thesis addresses objects. How can a robot realize what objects are if it looks at a scene and has no knowledge about objects? How can it recover from situations where its hypotheses about what it sees are wrong? Even if it has built up experience in form of learned objects, there will be situations where it will be uncertain or mistaken, and will therefore still need the ability to correct errors. Much of our daily lives involves interactions with objects, and the same will be true robots existing among us. Apart from being able to identify individual objects, the robot will therefore need to manipulate them.

    Throughout the thesis, different aspects of how to deal with these questions is addressed. The focus is on the problem of a robot automatically partitioning a scene into its constituting objects. It is assumed that the robot does not know about specific objects, and is therefore considered inexperienced. Instead a method is proposed that generates object hypotheses given visual input, and then enables the robot to recover from erroneous hypotheses. This is done by the robot drawing from a human's experience, as well as by enabling it to interact with the scene itself and monitoring if the observed changes are in line with its current beliefs about the scene's structure.

    Furthermore, the task of object manipulation for unknown objects is explored. This is also used as a motivation why the scene partitioning problem is essential to solve. Finally aspects of monitoring the outcome of a manipulation is investigated by observing the evolution of flexible objects in both static and dynamic scenes. All methods that were developed for this thesis have been tested and evaluated on real robotic platforms. These evaluations show the importance of having a system capable of recovering from errors and that the robot can take advantage of human experience using just simple commands.

  • 30.
    Bergström, Niklas
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Björkman, Mårten
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Bohg, Jeannette
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Roberson-Johnson, Matthew
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kootstra, Gert
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Active Scene Analysis2010Conference paper (Refereed)
  • 31.
    Bergström, Niklas
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Yamakawa, Yuji
    Senoo, Taku
    Ishikawa, Masatoshi
    On-line learning of temporal state models for flexible objects2012In: 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids), IEEE , 2012, p. 712-718Conference paper (Refereed)
    Abstract [en]

    State estimation and control are intimately related processes in robot handling of flexible and articulated objects. While for rigid objects, we can generate a CAD model before-hand and a state estimation boils down to estimation of pose or velocity of the object, in case of flexible and articulated objects, such as a cloth, the representation of the object's state is heavily dependent on the task and execution. For example, when folding a cloth, the representation will mainly depend on the way the folding is executed.

  • 32.
    Bergström, Niklas
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Yamakawa, Yuji
    Tokyo University.
    Senoo, Taku
    Tokyo University.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ishikawa, Masatoshi
    Tokyo University.
    State Recognition of Deformable Objects Using Shape Context2011In: The 29th Annual Conference of the Robotics Society of Japan, 2011Conference paper (Other academic)
  • 33. Berrada, Dounia
    et al.
    Romero, Mario
    Georgia Institute of Technology, US.
    Abowd, Gregory
    Blount, Marion
    Davis, John
    Automatic Administration of the Get Up and Go Test2007In: HealthNet'07: Proceedings of the 1st ACM SIGMOBILE International Workshop on Systems and Networking Support for Healthcare and Assisted Living Environments, ACM Digital Library, 2007, p. 73-75Conference paper (Refereed)
    Abstract [en]

    In-home monitoring using sensors has the potential to improve the life of elderly and chronically ill persons, assist their family and friends in supervising their status, and provide early warning signs to the person's clinicians. The Get Up and Go test is a clinical test used to assess the balance and gait of a patient. We propose a way to automatically apply an abbreviated version of this test to patients in their residence using video data without body-worn sensors or markers.

  • 34. Björkman, Eva
    et al.
    Zagal, Juan Cristobal
    Lindeberg, Tony
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Roland, Per E.
    Evaluation of design options for the scale-space primal sketch analysis of brain activation images2000In: : HBM'00, published in Neuroimage, volume 11, number 5, 2000, 2000, Vol. 11, p. 656-656Conference paper (Refereed)
    Abstract [en]

    A key issue in brain imaging concerns how to detect the functionally activated regions from PET and fMRI images. In earlier work, it has been shown that the scale-space primal sketch provides a useful tool for such analysis [1]. The method includes presmoothing with different filter widths and automatic estimation of the spatial extent of the activated regions (blobs).

    The purpose is to present two modifications of the scale-space primal sketch, as well as a quantitative evaluation which shows that these modifications improve the performance, measured as the separation between blob descriptors extracted from PET images and from noise images. This separation is essential for future work of associating a statistical p-value with the scale-space blob descriptors.

  • 35.
    Björkman, Mårten
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Bekiroglu, Yasemin
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Learning to Disambiguate Object Hypotheses through Self-Exploration2014In: 14th IEEE-RAS International Conference onHumanoid Robots, IEEE Computer Society, 2014Conference paper (Refereed)
    Abstract [en]

    We present a probabilistic learning framework to form object hypotheses through interaction with the environment. A robot learns how to manipulate objects through pushing actions to identify how many objects are present in the scene. We use a segmentation system that initializes object hypotheses based on RGBD data and adopt a reinforcement approach to learn the relations between pushing actions and their effects on object segmentations. Trained models are used to generate actions that result in minimum number of pushes on object groups, until either object separation events are observed or it is ensured that there is only one object acted on. We provide baseline experiments that show that a policy based on reinforcement learning for action selection results in fewer pushes, than if pushing actions were selected randomly.

  • 36.
    Björkman, Mårten
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Bekiroglu, Yasemin
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Högman, Virgile
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Enhancing Visual Perception of Shape through Tactile Glances2013In: Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on, IEEE conference proceedings, 2013, p. 3180-3186Conference paper (Refereed)
    Abstract [en]

    Object shape information is an important parameter in robot grasping tasks. However, it may be difficult to obtain accurate models of novel objects due to incomplete and noisy sensory measurements. In addition, object shape may change due to frequent interaction with the object (cereal boxes, etc). In this paper, we present a probabilistic approach for learning object models based on visual and tactile perception through physical interaction with an object. Our robot explores unknown objects by touching them strategically at parts that are uncertain in terms of shape. The robot starts by using only visual features to form an initial hypothesis about the object shape, then gradually adds tactile measurements to refine the object model. Our experiments involve ten objects of varying shapes and sizes in a real setup. The results show that our method is capable of choosing a small number of touches to construct object models similar to real object shapes and to determine similarities among acquired models.

  • 37.
    Björkman, Mårten
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Bergström, Niklas
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Detecting, segmenting and tracking unknown objects using multi-label MRF inference2014In: Computer Vision and Image Understanding, ISSN 1077-3142, E-ISSN 1090-235X, Vol. 118, p. 111-127Article in journal (Refereed)
    Abstract [en]

    This article presents a unified framework for detecting, segmenting and tracking unknown objects in everyday scenes, allowing for inspection of object hypotheses during interaction over time. A heterogeneous scene representation is proposed, with background regions modeled as a combinations of planar surfaces and uniform clutter, and foreground objects as 3D ellipsoids. Recent energy minimization methods based on loopy belief propagation, tree-reweighted message passing and graph cuts are studied for the purpose of multi-object segmentation and benchmarked in terms of segmentation quality, as well as computational speed and how easily methods can be adapted for parallel processing. One conclusion is that the choice of energy minimization method is less important than the way scenes are modeled. Proximities are more valuable for segmentation than similarity in colors, while the benefit of 3D information is limited. It is also shown through practical experiments that, with implementations on GPUs, multi-object segmentation and tracking using state-of-art MRF inference methods is feasible, despite the computational costs typically associated with such methods.

  • 38.
    Bohg, Jeannette
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Bergström, Niklas
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Björkman, Mårten
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Acting and Interacting in the Real World2011Conference paper (Refereed)
  • 39.
    Bore, Nils
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ambrus, Rares
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Jensfelt, Patric
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Folkesson, John
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Efficient retrieval of arbitrary objects from long-term robot observations2017In: Robotics and Autonomous Systems, ISSN 0921-8890, E-ISSN 1872-793X, Vol. 91, p. 139-150Article in journal (Refereed)
    Abstract [en]

    We present a novel method for efficient querying and retrieval of arbitrarily shaped objects from large amounts of unstructured 3D point cloud data. Our approach first performs a convex segmentation of the data after which local features are extracted and stored in a feature dictionary. We show that the representation allows efficient and reliable querying of the data. To handle arbitrarily shaped objects, we propose a scheme which allows incremental matching of segments based on similarity to the query object. Further, we adjust the feature metric based on the quality of the query results to improve results in a second round of querying. We perform extensive qualitative and quantitative experiments on two datasets for both segmentation and retrieval, validating the results using ground truth data. Comparison with other state of the art methods further enforces the validity of the proposed method. Finally, we also investigate how the density and distribution of the local features within the point clouds influence the quality of the results.

  • 40.
    Bretzner, Lars
    et al.
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Laptev, Ivan
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Lindeberg, Tony
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Hand-gesture recognition using multi-scale colour features, hierarchical features and particle filtering2002In: Fifth IEEE International Conference on Automatic Face and Gesture Recognition, 2002. Proceedings, IEEE conference proceedings, 2002, p. 63-74Conference paper (Refereed)
    Abstract [en]

    This paper presents algorithms and a prototype systemfor hand tracking and hand posture recognition. Hand posturesare represented in terms of hierarchies of multi-scalecolour image features at different scales, with qualitativeinter-relations in terms of scale, position and orientation. Ineach image, detection of multi-scale colour features is performed.Hand states are then simultaneously detected andtracked using particle filtering, with an extension of layeredsampling referred to as hierarchical layered sampling. Experimentsare presented showing that the performance ofthe system is substantially improved by performing featuredetection in colour space and including a prior with respectto skin colour. These components have been integrated intoa real-time prototype system, applied to a test problem ofcontrolling consumer electronics using hand gestures. In asimplified demo scenario, this system has been successfullytested by participants at two fairs during 2001.

  • 41.
    Bretzner, Lars
    et al.
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Laptev, Ivan
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Lindeberg, Tony
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Lenman, S.
    Sundblad, Y.
    A Prototype System for Computer Vision Based Human Computer Interaction2001Report (Other academic)
  • 42.
    Bretzner, Lars
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Lindeberg, Tony
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Feature Tracking with Automatic Selection of Spatial Scales1998In: Computer Vision and Image Understanding, ISSN 1077-3142, E-ISSN 1090-235X, Vol. 71, no 3, p. 385-393Article in journal (Refereed)
    Abstract [en]

    When observing a dynamic world, the size of image structures may vary over time. This article emphasizes the need for including explicit mechanisms for automatic scale selection in feature tracking algorithms in order to: (i) adapt the local scale of processing to the local image structure, and (ii) adapt to the size variations that may occur over time. The problems of corner detection and blob detection are treated in detail, and a combined framework for feature tracking is presented. The integrated tracking algorithm overcomes some of the inherent limitations of exposing fixed-scale tracking methods to image sequences in which the size variations are large. It is also shown how the stability over time of scale descriptors can be used as a part of a multi-cue similarity measure for matching. Experiments on real-world sequences are presented showing the performance of the algorithm when applied to (individual) tracking of corners and blobs.

  • 43.
    Bretzner, Lars
    et al.
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Lindeberg, Tony
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Feature tracking with automatic selection of spatial scales1998Report (Other academic)
    Abstract [en]

    When observing a dynamic world, the size of image structures may vary over nada. This article emphasizes the need for including explicit mechanisms for automatic scale selection in feature tracking algorithms in order to: (i) adapt the local scale of processing to the local image structure, and (ii) adapt to the size variations that may occur over time.

    The problems of corner detection and blob detection are treated in detail, and a combined framework for feature tracking is presented in which the image features at every time moment are detected at locally determined and automatically selected nadaes. A useful property of the scale selection method is that the scale levels selected in the feature detection step reflect the spatial extent of the image structures. Thereby, the integrated tracking algorithm has the ability to adapt to spatial as well as temporal size variations, and can in this way overcome some of the inherent limitations of exposing fixed-scale tracking methods to image sequences in which the size variations are large.

    In the composed tracking procedure, the scale information is used for two additional major purposes: (i) for defining local regions of interest for searching for matching candidates as well as setting the window size for correlation when evaluating matching candidates, and (ii) stability over time of the scale and significance descriptors produced by the scale selection procedure are used for formulating a multi-cue similarity measure for matching.

    Experiments on real-world sequences are presented showing the performance of the algorithm when applied to (individual) tracking of corners and blobs. Specifically, comparisons with fixed-scale tracking methods are included as well as illustrations of the increase in performance obtained by using multiple cues in the feature matching step.

  • 44.
    Bretzner, Lars
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Lindeberg, Tony
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    On the handling of spatial and temporal scales in feature tracking1997In: Scale-Space Theory in Computer Vision: First International Conference, Scale-Space'97 Utrecht, The Netherlands, July 2–4, 1997 Proceedings, Springer Berlin/Heidelberg, 1997, p. 128-139Conference paper (Refereed)
  • 45.
    Bretzner, Lars
    et al.
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Lindeberg, Tony
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Qualitative Multi-Scale Feature Hierarchies for Object Tracking2000In: Journal of Visual Communication and Image Representation, ISSN 1047-3203, E-ISSN 1095-9076, Vol. 11, p. 115-129Article in journal (Refereed)
    Abstract [en]

    This paper shows how the performance of feature trackers can be improved by building a view-based object representation consisting of qualitative relations between image structures at different scales. The idea is to track all image features individually, and to use the qualitative feature relations for resolving ambiguous matches and for introducing feature hypotheses whenever image features are mismatched or lost. Compared to more traditional work on view-based object tracking, this methodology has the ability to handle semi-rigid objects and partial occlusions. Compared to trackers based on three-dimensional object models, this approach is much simpler and of a more generic nature. A hands-on example is presented showing how an integrated application system can be constructed from conceptually very simple operations.

  • 46.
    Bretzner, Lars
    et al.
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Lindeberg, Tony
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Qualitative multiscale feature hierarchies for object tracking2000Report (Refereed)
    Abstract [en]

    This paper shows how the performance of feature trackers can be improved by building a hierarchical view-based object representation consisting of qualitative relations between image structures at different scales. The idea is to track all image features individually and to use the qualitative feature relations for avoiding mismatches, for resolving ambiguous matches, and for introducing feature hypotheses whenever image features are lost. Compared to more traditional work on view-based object tracking, this methodology has the ability to handle semirigid objects and partial occlusions. Compared to trackers based on three-dimensional object models, this approach is much simpler and of a more generic nature. A hands-on example is presented showing how an integrated application system can be constructed from conceptually very simple operations.

  • 47.
    Bretzner, Lars
    et al.
    KTH, Superseded Departments (pre-2005), Numerical Analysis and Computer Science, NADA.
    Lindeberg, Tony
    KTH, Superseded Departments (pre-2005), Numerical Analysis and Computer Science, NADA.
    Qualitative multi-scale feature hierarchies for object tracking1999In: Proc Scale-Space Theories in Computer Vision Med, Elsevier, 1999, p. 117-128Conference paper (Refereed)
    Abstract [en]

    This paper shows how the performance of feature trackers can be improved by building a view-based object representation consisting of qualitative relations between image structures at different scales. The idea is to track all image features individually, and to use the qualitative feature relations for resolving ambiguous matches and for introducing feature hypotheses whenever image features are mismatched or lost. Compared to more traditional work on view-based object tracking, this methodology has the ability to handle semi-rigid objects and partial occlusions. Compared to trackers based on three-dimensional object models, this approach is much simpler and of a more generic nature. A hands-on example is presented showing how an integrated application system can be constructed from conceptually very simple operations.

  • 48.
    Bretzner, Lars
    et al.
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Lindeberg, Tony
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Structure and Motion Estimation using Sparse Point and Line Correspondences in Multiple Affine Views1999Report (Other academic)
    Abstract [en]

    This paper addresses the problem of computing three-dimen\-sional structure and motion from an unknown rigid configuration of points and lines viewed by an affine projection model. An algebraic structure, analogous to the trilinear tensor for three perspective cameras, is defined for configurations of three centered affine cameras. This centered affine trifocal tensor contains 12 non-zero coefficients and involves linear relations between point correspondences and trilinear relations between line correspondences. It is shown how the affine trifocal tensor relates to the perspective trilinear tensor, and how three-dimensional motion can be computed from this tensor in a straightforward manner. A factorization approach is developed to handle point features and line features simultaneously in image sequences, and degenerate feature configurations are analysed. This theory is applied to a specific problem in human-computer interaction of capturing three-dimensional rotations from gestures of a human hand. This application to quantitative gesture analyses illustrates the usefulness of the affine trifocal tensor in a situation where sufficient information is not available to compute the perspective trilinear tensor, while the geometry requires point correspondences as well as line correspondences over at least three views.

  • 49.
    Bretzner, Lars
    et al.
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Lindeberg, Tony
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Use your hand as a 3-D mouse or relative orientation from extended sequences of sparse point and line correspondances using the affine trifocal tensor1998In: Computer Vision — ECCV'98: 5th European Conference on Computer Vision Freiburg, Germany, June, 2–6, 1998 Proceedings, Volume I, Springer Berlin/Heidelberg, 1998, Vol. 1406, p. 141-157Conference paper (Refereed)
    Abstract [en]

    This paper addresses the problem of computing three-dimensional structure and motion from an unknown rigid configuration of point and lines viewed by an affine projection model. An algebraic structure, analogous to the trilinear tensor for three perspective cameras, is defined for configurations of three centered affine cameras. This centered affine trifocal tensor contains 12 coefficients and involves linear relations between point correspondences and trilinear relations between line correspondences It is shown how the affine trifocal tensor relates to the perspective trilinear tensor, and how three-dimensional motion can be computed from this tensor in a straightforward manner. A factorization approach is also developed to handle point features and line features simultaneously in image sequences.

    This theory is applied to a specific problem of human-computer interaction of capturing three-dimensional rotations from gestures of a human hand. A qualitative model is presented, in which three fingers are represented by their position and orientation, and it is shown how three point correspondences (blobs at the finger tips) and three line correspondences (ridge features at the fingers) allow the affine trifocal tensor to be determined, from which the rotation is computed. Besides the obvious application, this test problem illustrates the usefulness of the affine trifocal tensor in a situation where sufficient information is not available to compute the perspective trilinear tensor, while the geometry requires point correspondences as well as line correspondences over at least three views.

  • 50.
    Brunnström, Kjell
    et al.
    KTH, Superseded Departments (pre-2005), Numerical Analysis and Computer Science, NADA.
    Eklundh, Jan-Olof
    KTH, Superseded Departments (pre-2005), Numerical Analysis and Computer Science, NADA.
    Lindeberg, Tony
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    On Scale and Resolution in the Analysis of Local Image Structure1990In: Proc. 1st European Conf. on Computer Vision, 1990, Vol. 427, p. 3-12Conference paper (Refereed)
    Abstract [en]

    Focus-of-attention is extremely important in human visual perception. If computer vision systems are to perform tasks in a complex, dynamic world they will have to be able to control processing in a way that is analogous to visual attention in humans.

    In this paper we will investigate problems in connection with foveation, that is examining selected regions of the world at high resolution. We will especially consider the problem of finding and classifying junctions from this aspect. We will show that foveation as simulated by controlled, active zooming in conjunction with scale-space techniques allows robust detection and classification of junctions.

1234567 1 - 50 of 418
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf