Change search
Refine search result
12 1 - 50 of 54
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Afkham, Heydar Maboudi
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    A topological framework for training latent variable models2014In: Proceedings - International Conference on Pattern Recognition, 2014, p. 2471-2476Conference paper (Refereed)
    Abstract [en]

    We discuss the properties of a class of latent variable models that assumes each labeled sample is associated with a set of different features, with no prior knowledge of which feature is the most relevant feature to be used. Deformable-Part Models (DPM) can be seen as good examples of such models. These models are usually considered to be expensive to train and very sensitive to the initialization. In this paper, we focus on the learning of such models by introducing a topological framework and show how it is possible to both reduce the learning complexity and produce more robust decision boundaries. We will also argue how our framework can be used for producing robust decision boundaries without exploiting the dataset bias or relying on accurate annotations. To experimentally evaluate our method and compare with previously published frameworks, we focus on the problem of image classification with object localization. In this problem, the correct location of the objects is unknown, during both training and testing stages, and is considered as a latent variable. ©

  • 2.
    Afkham, Heydar Maboudi
    et al.
    KTH, School of Biotechnology (BIO), Gene Technology.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Gradual improvement of image descriptor quality2014In: ICPRAM 2014 - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods, 2014, p. 233-238Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a framework for gradually improving the quality of an already existing image descriptor. The descriptor used in this paper (Afkham et al., 2013) uses the response of a series of discriminative components for summarizing each image. As we will show, this descriptor has an ideal form in which all categories become linearly separable. While, reaching this form is not feasible, we will argue how by replacing a small fraction of these components, it is possible to obtain a descriptor which is, on average, closer to this ideal form. To do so, we initially identify which components do not contribute to the quality of the descriptor and replace them with more robust components. Here, a joint feature selection method is used to find improved components. As our experiments show, this change directly reflects in the capability of the resulting descriptor in discriminating between different categories.

  • 3.
    Afkham, Heydar Maboudi
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Initialization framework for latent variable models2014In: ICPRAM 2014 - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods, 2014, p. 227-232Conference paper (Refereed)
    Abstract [en]

    In this paper, we discuss the properties of a class of latent variable models that assumes each labeled sample is associated with set of different features, with no prior knowledge of which feature is the most relevant feature to be used. Deformable-Part Models (DPM) can be seen as good example of such models. While Latent SVM framework (LSVM) has proven to be an efficient tool for solving these models, we will argue that the solution found by this tool is very sensitive to the initialization. To decrease this dependency, we propose a novel clustering procedure, for these problems, to find cluster centers that are shared by several sample sets while ignoring the rest of the cluster centers. As we will show, these cluster centers will provide a robust initialization for the LSVM framework.

  • 4.
    Baisero, Andrea
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Pokorny, Florian T.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    The Path Kernel2013In: ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods, 2013, p. 50-57Conference paper (Refereed)
    Abstract [en]

    Kernel methods have been used very successfully to classify data in various application domains. Traditionally, kernels have been constructed mainly for vectorial data defined on a specific vector space. Much less work has been addressing the development of kernel functions for non-vectorial data. In this paper, we present a new kernel for encoding sequential data. We present our results comparing the proposed kernel to the state of the art, showing a significant improvement in classification and a much improved robustness and interpretability.

  • 5.
    Baisero, Andrea
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Pokorny, Florian T.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    The path kernel: A novel kernel for sequential data2015In: Pattern Recognition: Applications and Methods : International Conference, ICPRAM 2013 Barcelona, Spain, February 15–18, 2013 Revised Selected Papers / [ed] Ana Fred, Maria De Marsico, Springer Berlin/Heidelberg, 2015, p. 71-84Conference paper (Refereed)
    Abstract [en]

    We define a novel kernel function for finite sequences of arbitrary length which we call the path kernel. We evaluate this kernel in a classification scenario using synthetic data sequences and show that our kernel can outperform state of the art sequential similarity measures. Furthermore, we find that, in our experiments, a clustering of data based on the path kernel results in much improved interpretability of such clusters compared to alternative approaches such as dynamic time warping or the global alignment kernel.

  • 6.
    Bergström, Niklas
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Björkman, Mårten
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Scene Understanding through Autonomous Interactive Perception2011In: Computer Vision Systems: Lecture Notes in Computer Science / [ed] Crowley James L., Draper Bruce, Thonnat Monique, Springer Verlag , 2011, p. 153-162Conference paper (Refereed)
    Abstract [en]

    We propose a framework for detecting, extracting and mod-eling objects in natural scenes from multi-modal data. Our frameworkis iterative, exploiting different hypotheses in a complementary manner.We employ the framework in realistic scenarios, based on visual appear-ance and depth information. Using a robotic manipulator that interactswith the scene, object hypotheses generated using appearance informa-tion are confirmed through pushing. The framework is iterative, eachgenerated hypothesis is feeding into the subsequent one, continuously re-fining the predictions about the scene. We show results that demonstratethe synergic effect of applying multiple hypotheses for real-world sceneunderstanding. The method is efficient and performs in real-time.

  • 7.
    Bergström, Niklas
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Yamakawa, Yuji
    Senoo, Taku
    Ishikawa, Masatoshi
    On-line learning of temporal state models for flexible objects2012In: 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids), IEEE , 2012, p. 712-718Conference paper (Refereed)
    Abstract [en]

    State estimation and control are intimately related processes in robot handling of flexible and articulated objects. While for rigid objects, we can generate a CAD model before-hand and a state estimation boils down to estimation of pose or velocity of the object, in case of flexible and articulated objects, such as a cloth, the representation of the object's state is heavily dependent on the task and execution. For example, when folding a cloth, the representation will mainly depend on the way the folding is executed.

  • 8.
    Bergström, Niklas
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Yamakawa, Yuji
    Tokyo University.
    Senoo, Taku
    Tokyo University.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ishikawa, Masatoshi
    Tokyo University.
    State Recognition of Deformable Objects Using Shape Context2011In: The 29th Annual Conference of the Robotics Society of Japan, 2011Conference paper (Other academic)
  • 9.
    Caccamo, Sergio
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Bekiroglu, Yasemin
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Active Exploration Using Gaussian Random Fields and Gaussian Process Implicit Surfaces2016In: 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 582-589Conference paper (Refereed)
    Abstract [en]

    In this work we study the problem of exploring surfaces and building compact 3D representations of the environment surrounding a robot through active perception. We propose an online probabilistic framework that merges visual and tactile measurements using Gaussian Random Field and Gaussian Process Implicit Surfaces. The system investigates incomplete point clouds in order to find a small set of regions of interest which are then physically explored with a robotic arm equipped with tactile sensors. We show experimental results obtained using a PrimeSense camera, a Kinova Jaco2 robotic arm and Optoforce sensors on different scenarios. We then demostrate how to use the online framework for object detection and terrain classification.

  • 10. Damianou, A. C.
    et al.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Titsias, M. K.
    Lawrence, N. D.
    Manifold relevance determination2012In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012, 2012, p. 145-152Conference paper (Refereed)
    Abstract [en]

    In this paper we present a fully Bayesian latent variable model which exploits conditional non-linear (in)-dependence structures to learn an efficient latent representation. The latent space is factorized to represent shared and private information from multiple views of the data. In contrast to previous approaches, we introduce a relaxation to the discrete segmentation and allow for a "softly" shared latent space. Further, Bayesian techniques allow us to automatically estimate the dimensionality of the latent spaces. The model is capable of capturing structure underlying extremely high dimensional spaces. This is illustrated by modelling unprocessed images with tenths of thousands of pixels. This also allows us to directly generate novel images from the trained model by sampling from the discovered latent spaces. We also demonstrate the model by prediction of human pose in an ambiguous setting. Our Bayesian framework allows us to perform disambiguation in a principled manner by including latent space priors which incorporate the dynamic nature of the data.

  • 11. Damianou, Andreas
    et al.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Boorman, Luke
    Lawrence, Neil D.
    Prescott, Tony J.
    A Top-Down Approach for a Synthetic Autobiographical Memory System2015In: BIOMIMETIC AND BIOHYBRID SYSTEMS, LIVING MACHINES 2015, Springer, 2015, p. 280-292Conference paper (Refereed)
    Abstract [en]

    Autobiographical memory (AM) refers to the organisation of one's experience into a coherent narrative. The exact neural mechanisms responsible for the manifestation of AM in humans are unknown. On the other hand, the field of psychology has provided us with useful understanding about the functionality of a bio-inspired synthetic AM (SAM) system, in a higher level of description. This paper is concerned with a top-down approach to SAM, where known components and organisation guide the architecture but the unknown details of each module are abstracted. By using Bayesian latent variable models we obtain a transparent SAM system with which we can interact in a structured way. This allows us to reveal the properties of specific sub-modules and map them to functionality observed in biological systems. The top-down approach can cope well with the high performance requirements of a bio-inspired cognitive system. This is demonstrated in experiments using faces data.

  • 12. Davies, A.
    et al.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Dalton, C.
    Campbell, N.
    Generating 3D Morphable Model parameters for facial tracking: Factorising identity and expression2012In: GRAPP 2012 IVAPP 2012 - Proceedings of the International Conference on Computer Graphics Theory and Applications and International Conference on Information Visualization Theory and Applications, 2012, p. 309-318Conference paper (Refereed)
    Abstract [en]

    The ability to factorise parameters into identity and expression parameters is highly desirable in facial tracking as it requires only the identity parameters to be set in the initial frame leaving the expression parameters to be adjusted in subsequent frames. In this paper we introduce a strategy for creating parameters for a data-driven 3D Morphable Model (3DMM) which are able to separately model the variance due to identity and expression found in the training data. We present three factorisation schemes and evaluate their appropriateness for tracking by comparing the variances between the identity coefficients and expression coefficients when fitted to data of individuals performing different facial expressions.

  • 13. Davies, Alexander
    et al.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Dalton, Colin J.
    Campbell, Neill
    Facial Movement Based Recognition2011In: 5th International Conference on Computer Vision/Computer Graphics Collaboration Techniques, MIRAGE 2011, 2011, p. 51-62Conference paper (Refereed)
    Abstract [en]

    The modelling and understanding of the facial dynamics of individuals is crucial to achieving higher levels of realistic facial animation. We address the recognition of individuals through modelling the facial motions of several subjects. Modelling facial motion comes with numerous challenges including accurate and robust tracking of facial movement, high dimensional data processing and non-linear spatial-temporal structural motion. We present a novel framework which addresses these problems through the use of video-specific Active Appearance Models (AAM) and Gaussian Process Latent Variable Models (GP-LVM). Our experiments and results qualitatively and quantitatively demonstrate the framework's ability to successfully differentiate individuals by temporally modelling appearance invariant facial motion. Thus supporting the proposition that a facial activity model may assist in the areas of motion retargeting, motion synthesis and experimental psychology.

  • 14.
    Detry, Renaud
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Madry, Marianna
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Learning a dictionary of prototypical grasp-predicting parts from grasping experience2013In: 2013 IEEE International Conference on Robotics and Automation (ICRA), New York: IEEE , 2013, p. 601-608Conference paper (Refereed)
    Abstract [en]

    We present a real-world robotic agent that is capable of transferring grasping strategies across objects that share similar parts. The agent transfers grasps across objects by identifying, from examples provided by a teacher, parts by which objects are often grasped in a similar fashion. It then uses these parts to identify grasping points onto novel objects. We focus our report on the definition of a similarity measure that reflects whether the shapes of two parts resemble each other, and whether their associated grasps are applied near one another. We present an experiment in which our agent extracts five prototypical parts from thirty-two real-world grasp examples, and we demonstrate the applicability of the prototypical parts for grasping novel objects.

  • 15.
    Detry, Renaud
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Madry, Marianna
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Piater, Justus
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Generalizing grasps across partly similar objects2012In: 2012 IEEE International Conference on Robotics and Automation (ICRA), IEEE Computer Society, 2012, p. 3791-3797Conference paper (Refereed)
    Abstract [en]

    The paper starts by reviewing the challenges associated to grasp planning, and previous work on robot grasping. Our review emphasizes the importance of agents that generalize grasping strategies across objects, and that are able to transfer these strategies to novel objects. In the rest of the paper, we then devise a novel approach to the grasp transfer problem, where generalization is achieved by learning, from a set of grasp examples, a dictionary of object parts by which objects are often grasped. We detail the application of dimensionality reduction and unsupervised clustering algorithms to the end of identifying the size and shape of parts that often predict the application of a grasp. The learned dictionary allows our agent to grasp novel objects which share a part with previously seen objects, by matching the learned parts to the current view of the new object, and selecting the grasp associated to the best-fitting part. We present and discuss a proof-of-concept experiment in which a dictionary is learned from a set of synthetic grasp examples. While prior work in this area focused primarily on shape analysis (parts identified, e.g., through visual clustering, or salient structure analysis), the key aspect of this work is the emergence of parts from both object shape and grasp examples. As a result, parts intrinsically encode the intention of executing a grasp.

  • 16.
    Ek, Carl Henrik
    Oxford Brookes University.
    Shared Gaussian Process Latent Variable Models2009Doctoral thesis, monograph (Other academic)
    Abstract [en]

    A fundamental task in machine learning is modeling the relationship between different observation spaces. Dimensionality reduction is the task of reducing thenumber of dimensions in a parameterization of a data-set. In this thesis we areinterested in the cross-road between these two tasks: shared dimensionality reduction. Shared dimensionality reduction aims to represent multiple observationspaces within the same model. Previously suggested models have been limited tothe scenarios where the observations have been generated from the same manifold.In this thesis we present a Gaussian Process Latent Variable Model (GP-LVM)[33] for shared dimensionality reduction without making assumptions about therelationship between the observations. Further we suggest an extension to Canonical Correlation Analysis (CCA) called Non Consolidating Component Analysis (NCCA). The proposed algorithm extends classical CCA to represent the fullvariance of the data opposed to only the correlated. We compare the suggestedGP-LVM model to existing models and show results on real-world problems exemplifying the advantages of our approach.

  • 17. Ek, Carl Henrik
    et al.
    Jaeckel, P.
    Campbell, Neill
    Melhuish, Chris
    Shared Gaussian Process Latent Variable Models for Handling Ambiguous Facial Expressions2009In: INTELLIGENT SYSTEMS AND AUTOMATION / [ed] Beji, L; Otmane, S; Abichou, A, 2009, p. 147-153Conference paper (Refereed)
  • 18.
    Ek, Carl Henrik
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    The importance of structure2011Conference paper (Refereed)
  • 19. Ek, Carl Henrik
    et al.
    Rihan, J.
    Torr, P.
    Rogez, G.
    Lawrence, Neil D.
    Ambiguity modeling in latent spaces2008In: MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS / [ed] PopescuBelis, A; Stiefelhagen, R, BERLIN: SPRINGER-VERLAG BERLIN , 2008, p. 62-73Conference paper (Refereed)
    Abstract [en]

    We are interested in the situation where we have two or more representations of an underlying phenomenon. In particular we are interested in the scenario where the representation Lire complementary. This implies that a single individual representation is not sufficient to fully discriminate a specific instance of the underlying phenomenon, it also means that each representation is an ambiguous representation of the other complementary spaces. In this paper we present a latent variable model capable of consolidating multiple complementary representations. Our method extends canonical correlation analysis by introducing additional latent spaces that Lire specific to the different representations, thereby explaining the full variance of the observations. These additional spaces, explaining representation specific variance, separately model the variance in a representation ambiguous to the other. We develop a spectral algorithm for fast computation of the embeddings and a probabilistic model (based on Gaussian processes) for validation and inference. The proposed model has several potential application areas, we demonstrate its use for multi-modal regression on a benchmark human pose estimation data set.

  • 20.
    Ek, Carl Henrik
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Song, Dan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Huebner, Kai
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Exploring affordances in robot grasping through latent structure representation2010In: The 11th European Conference on Computer Vision (ECCV 2010), 2010Conference paper (Refereed)
  • 21.
    Ek, Carl Henrik
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Song, Dan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Huebner, Kai
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Task Modeling in Imitation Learning using Latent Variable Models2010In: 2010 10th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2010, 2010, p. 458-553Conference paper (Refereed)
    Abstract [en]

    An important challenge in robotic research is learning and reasoning about different manipulation tasks from scene observations. In this paper we present a probabilistic model capable of modeling several different types of input sources within the same model. Our model is capable to infer the task using only partial observations. Further, our framework allows the robot, given partial knowledge of the scene, to reason about what information streams to acquire in order to disambiguate the state-space the most. We present results for task classification within and also reason about different features discriminative power for different classes of tasks.

  • 22.
    Ek, Carl Henrik
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Song, Dan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Learning Conditional Structures in Graphical Models from a Large Set of Observation Streams through efficient Discretisation2011In: IEEE International Conference on Robotics and Automation, Workshop on Manipulation under Uncertainty, 2011Conference paper (Refereed)
  • 23. Ek, Carl Henrik
    et al.
    Torr, Phil
    Lawrence, Neil D.
    Gaussian process latent variable models for human pose estimation2007In: MACHINE LEARNING FOR MULTIMODAL INTERACTION / [ed] Belis, AP; Renals, S; Bourlard, H, 2007, p. 132-143Conference paper (Refereed)
    Abstract [en]

    We describe a method for recovering 3D human body pose from silhouettes. Our model is based on learning a latent space using the Gaussian Process Latent Variable Model (GP-LVM) [1] encapsulating both pose and silhouette features Our method is generative, this allows us to model the ambiguities of a silhouette representation in a principled way. We learn a dynamical model over the latent space which allows us to disambiguate between ambiguous silhouettes by temporal consistency. The model has only two free parameters and has several advantages over both regression approaches and other generative methods. In addition to the application shown in this paper the suggested model is easily extended to multiple observation spaces without constraints on type.

  • 24. Ek, Carl Henrik
    et al.
    Torr, Philip H. S.
    Lawrence, Neil D.
    GP-LVM for Data Consolidation2008Conference paper (Refereed)
  • 25. Feix, Thomas
    et al.
    Romero, Javier
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Schmiedmayer, Heinz-Bodo
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    A Metric for Comparing the Anthropomorphic Motion Capability of Artificial Hands2013In: IEEE Transactions on robotics, ISSN 1552-3098, E-ISSN 1941-0468, Vol. 29, no 1, p. 82-93Article in journal (Refereed)
    Abstract [en]

    We propose a metric for comparing the anthropomorphic motion capability of robotic and prosthetic hands. The metric is based on the evaluation of how many different postures or configurations a hand can perform by studying the reachable set of fingertip poses. To define a benchmark for comparison, we first generate data with human subjects based on an extensive grasp taxonomy. We then develop a methodology for comparison using generative, nonlinear dimensionality reduction techniques. We assess the performance of different hands with respect to the human hand and with respect to each other. The method can be used to compare other types of kinematic structures.

  • 26.
    Hjelm, Martin
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Detry, R.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Representations for cross-task, cross-object grasp Transfer2014In: Proceedings - IEEE International Conference on Robotics and Automation, IEEE conference proceedings, 2014, p. 5699-5704Conference paper (Refereed)
    Abstract [en]

    We address The problem of Transferring grasp knowledge across objects and Tasks. This means dealing with Two important issues: 1) The induction of possible Transfers, i.e., whether a given object affords a given Task, and 2) The planning of a grasp That will allow The robot To fulfill The Task. The induction of object affordances is approached by abstracting The sensory input of an object as a set of attributes That The agent can reason about Through similarity and proximity. For grasp execution, we combine a part-based grasp planner with a model of Task constraints. The Task constraint model indicates areas of The object That The robot can grasp To execute The Task. Within These areas, The part-based planner finds a hand placement That is compatible with The object shape. The key contribution is The ability To Transfer Task parameters across objects while The part-based grasp planner allows for Transferring grasp information across Tasks. As a result, The robot is able To synthesize plans for previously unobserved Task/object combinations. We illustrate our approach with experiments conducted on a real robot.

  • 27.
    Hjelm, Martin
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Detry, Renaud
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Sparse Summarization of Robotic Grasping Data2013In: 2013 IEEE International Conference on Robotics and Automation (ICRA), New York: IEEE , 2013, p. 1082-1087Conference paper (Refereed)
    Abstract [en]

    We propose a new approach for learning a summarized representation of high dimensional continuous data. Our technique consists of a Bayesian non-parametric model capable of encoding high-dimensional data from complex distributions using a sparse summarization. Specifically, the method marries techniques from probabilistic dimensionality reduction and clustering. We apply the model to learn efficient representations of grasping data for two robotic scenarios.

  • 28.
    Hjelm, Martin
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Detry, Renaud
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Learning Human Priors for Task-Constrained Grasping2015In: COMPUTER VISION SYSTEMS (ICVS 2015), Springer Berlin/Heidelberg, 2015, p. 207-217Conference paper (Refereed)
    Abstract [en]

    An autonomous agent using manmade objects must understand how task conditions the grasp placement. In this paper we formulate task based robotic grasping as a feature learning problem. Using a human demonstrator to provide examples of grasps associated with a specific task, we learn a representation, such that similarity in task is reflected by similarity in feature. The learned representation discards parts of the sensory input that is redundant for the task, allowing the agent to ground and reason about the relevant features for the task. Synthesized grasps for an observed task on previously unseen objects can then be filtered and ordered by matching to learned instances without the need of an analytically formulated metric. We show on a real robot how our approach is able to utilize the learned representation to synthesize and perform valid task specific grasps on novel objects.

  • 29.
    Luo, Guoliang
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Bergström, Niklas
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Representing actions with Kernels2011In: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011, p. 2028-2035Conference paper (Refereed)
    Abstract [en]

    A long standing research goal is to create robots capable of interacting with humans in dynamic environments.To realise this a robot needs to understand and interpret the underlying meaning and intentions of a human action through a model of its sensory data. The visual domain provides a rich description of the environment and data is readily available in most system through inexpensive cameras. However, such data is very high-dimensional and extremely redundant making modeling challenging.Recently there has been a significant interest in semantic modeling from visual stimuli. Even though results are encouraging available methods are unable to perform robustly in realworld scenarios.In this work we present a system for action modeling from visual data by proposing a new and principled interpretation for representing semantic information. The representation is integrated with a real-time segmentation. The method is robust and flexible making it applicable for modeling in a realistic interaction scenario which demands handling noisy observations and require real-time performance. We provide extensive evaluation and show significant improvements compared to the state-of-the-art.

  • 30.
    Maboudi Afkham, Heydar
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Qualitative vocabulary based descriptor2013In: ICPRAM 2013: Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods, 2013, p. 188-193Conference paper (Refereed)
    Abstract [en]

    Creating a single feature descriptors from a collection of feature responses is an often occurring task. As such the bag-of-words descriptors have been very successful and applied to data from a large range of different domains. Central to this approach is making an association of features to words. In this paper we present a new and novel approach to feature to word association problem. The proposed method creates a more robust representation when data is noisy and requires less words compared to the traditional methods while retaining similar performance. We experimentally evaluate the method on a challenging image classification data-set and show significant improvement to the state of the art.

  • 31.
    Madry, Marianna
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Detry, Renaud
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Hang, Kaiyu
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Improving Generalization for 3D Object Categorization with Global Structure Histograms2012In: Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, IEEE conference proceedings, 2012, p. 1379-1386Conference paper (Refereed)
    Abstract [en]

    We propose a new object descriptor for three dimensional data named the Global Structure Histogram (GSH). The GSH encodes the structure of a local feature response on a coarse global scale, providing a beneficial trade-off between generalization and discrimination. Encoding the structural characteristics of an object allows us to retain low local variations while keeping the benefit of global representativeness. In an extensive experimental evaluation, we applied the framework to category-based object classification in realistic scenarios. We show results obtained by combining the GSH with several different local shape representations, and we demonstrate significant improvements to other state-of-the-art global descriptors.

  • 32.
    Madry, Marianna
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Maboudi Afkham, Heydar
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Extracting essential local object characteristics for 3D object categorization2013In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE conference proceedings, 2013, p. 2240-2247Conference paper (Refereed)
    Abstract [en]

    Most object classes share a considerable amount of local appearance and often only a small number of features are discriminative. The traditional approach to represent an object is based on a summarization of the local characteristics by counting the number of feature occurrences. In this paper we propose the use of a recently developed technique for summarizations that, rather than looking into the quantity of features, encodes their quality to learn a description of an object. Our approach is based on extracting and aggregating only the essential characteristics of an object class for a task. We show how the proposed method significantly improves on previous work in 3D object categorization. We discuss the benefits of the method in other scenarios such as robot grasping. We provide extensive quantitative and qualitative experiments comparing our approach to the state of the art to justify the described approach.

  • 33.
    Madry, Marianna
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Song, Dan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    "Robot, bring me something to drink from": object representation for transferring task specific grasps2013In: In IEEE International Conference on Robotics and Automation (ICRA 2012), Workshop on Semantic Perception, Mapping and Exploration (SPME),  St. Paul, MN, USA, May 13, 2012, 2013Conference paper (Refereed)
    Abstract [en]

    In this paper, we present an approach for taskspecificobject representation which facilitates transfer of graspknowledge from a known object to a novel one. Our representation encompasses: (a) several visual object properties,(b) object functionality and (c) task constrains in order to provide a suitable goal-directed grasp. We compare various features describing complementary object attributes to evaluate the balance between the discrimination and generalization properties of the representation. The experimental setup is a scene containing multiple objects. Individual object hypotheses are first detected, categorized and then used as the input to a grasp reasoning system that encodes the task information. Our approach not only allows to find objects in a real world scene that afford a desired task, but also to generate and successfully transfer task-based grasp within and across object categories.

  • 34. Patel, M.
    et al.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kyriazis, N.
    Argyros, A.
    Miro, J. V.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Language for learning complex human-object interactions2013In: 2013 IEEE International Conference on Robotics and Automation (ICRA), IEEE Computer Society, 2013, p. 4997-5002Conference paper (Refereed)
    Abstract [en]

    In this paper we use a Hierarchical Hidden Markov Model (HHMM) to represent and learn complex activities/task performed by humans/robots in everyday life. Action primitives are used as a grammar to represent complex human behaviour and learn the interactions and behaviour of human/robots with different objects. The main contribution is the use of a probabilistic model capable of representing behaviours at multiple levels of abstraction to support the proposed hypothesis. The hierarchical nature of the model allows decomposition of the complex task into simple action primitives. The framework is evaluated with data collected for tasks of everyday importance performed by a human user.

  • 35. Patel, Mitesh
    et al.
    Miro, Jaime Valls
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Dissanayake, Gamini
    Learning object, grasping and manipulation activities using hierarchical HMMs2014In: Autonomous Robots, ISSN 0929-5593, E-ISSN 1573-7527, Vol. 37, no 3, p. 317-331Article in journal (Refereed)
    Abstract [en]

    This article presents a probabilistic algorithm for representing and learning complex manipulation activities performed by humans in everyday life. The work builds on the multi-level Hierarchical Hidden Markov Model (HHMM) framework which allows decomposition of longer-term complex manipulation activities into layers of abstraction whereby the building blocks can be represented by simpler action modules called action primitives. This way, human task knowledge can be synthesised in a compact, effective representation suitable, for instance, to be subsequently transferred to a robot for imitation. The main contribution is the use of a robust framework capable of dealing with the uncertainty or incomplete data inherent to these activities, and the ability to represent behaviours at multiple levels of abstraction for enhanced task generalisation. Activity data from 3D video sequencing of human manipulation of different objects handled in everyday life is used for evaluation. A comparison with a mixed generative-discriminative hybrid model HHMM/SVM (support vector machine) is also presented to add rigour in highlighting the benefit of the proposed approach against comparable state of the art techniques.

  • 36.
    Pieropan, Alessandro
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Functional Object Descriptors for Human Activity Modeling2013In: 2013 IEEE International Conference on Robotics and Automation (ICRA), IEEE conference proceedings, 2013, p. 1282-1289Conference paper (Refereed)
    Abstract [en]

    The ability to learn from human demonstration is essential for robots in human environments. The activity models that the robot builds from observation must take both the human motion and the objects involved into account. Object models designed for this purpose should reflect the role of the object in the activity - its function, or affordances. The main contribution of this paper is to represent object directly in terms of their interaction with human hands, rather than in terms of appearance. This enables the direct representation of object affordances/function, while being robust to intra-class differences in appearance. Object hypotheses are first extracted from a video sequence as tracks of associated image segments. The object hypotheses are encoded as strings, where the vocabulary corresponds to different types of interaction with human hands. The similarity between two such object descriptors can be measured using a string kernel. Experiments show these functional descriptors to capture differences and similarities in object affordances/function that are not represented by appearance.

  • 37.
    Pieropan, Alessandro
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Recognizing Object Affordances in Terms of Spatio-Temporal Object-Object Relationships2014In: Humanoid Robots (Humanoids), 2014 14th IEEE-RAS International Conference on, IEEE conference proceedings, 2014, p. 52-58Conference paper (Refereed)
    Abstract [en]

    In this paper we describe a probabilistic framework that models the interaction between multiple objects in a scene.We present a spatio-temporal feature encoding pairwise interactions between each object in the scene. By the use of a kernel representation we embed object interactions in a vector space which allows us to define a metric comparing interactions of different temporal extent. Using this metric we define a probabilistic model which allows us to represent and extract the affordances of individual objects based on the structure of their interaction. In this paper we focus on the presented pairwise relationships but the model can naturally be extended to incorporate additional cues related to a single object or multiple objects. We compare our approach with traditional kernel approaches and show a significant improvement.

  • 38.
    Pokorny, Florian T.
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Persistent Homology for Learning Densities with Bounded Support2012In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012 / [ed] P. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou and K.Q. Weinberger, Curran Associates, Inc., 2012, p. 1817-1825Conference paper (Refereed)
    Abstract [en]

    We present a novel method for learning densities with bounded support which enables us to incorporate 'hard' topological constraints. In particular, we show how emerging techniques from computational algebraic topology and the notion of persistent homology can be combined with kernel-based methods from machine learning for the purpose of density estimation. The proposed formalism facilitates learning of models with bounded support in a principled way, and - by incorporating persistent homology techniques in our approach - we are able to encode algebraic-topological constraints which are not addressed in current state of the art probabilistic models. We study the behaviour of our method on two synthetic examples for various sample sizes and exemplify the benefits of the proposed approach on a real-world dataset by learning a motion model for a race car. We show how to learn a model which respects the underlying topological structure of the racetrack, constraining the trajectories of the car.

  • 39.
    Pokorny, Florian T.
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Topological Constraints and Kernel-Based Density Estimation2012Conference paper (Refereed)
    Abstract [en]

    This extended abstract1 explores the question of how to estimate a probability distribution from a finite number of samples when information about the topology of the support region of an underlying density is known. This workshop contribution is a continuation of our recent work [1] combining persistent homology and kernel-based density estimation for the first time and in which we explored an approach capable of incorporating topological constraints in bandwidth selection. We report on some recent experiments with high-dimensional motion capture data which show that our method is applicable even in high dimensions and develop our ideas for potential future applications of this framework.

  • 40. Romero, Javier
    et al.
    Feix, Thomas
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Extracting Postural Synergies for Robotic Grasping2013In: IEEE Transactions on robotics, ISSN 1552-3098, E-ISSN 1941-0468, Vol. 29, no 6, p. 1342-1352Article in journal (Refereed)
    Abstract [en]

    We address the problem of representing and encoding human hand motion data using nonlinear dimensionality reduction methods. We build our work on the notion of postural synergies being typically based on a linear embedding of the data. In addition to addressing the encoding of postural synergies using nonlinear methods, we relate our work to control strategies of combined reaching and grasping movements. We show the drawbacks of the (commonly made) causality assumption and propose methods that model the data as being generated from an inferred latent manifold to cope with the problem. Another important contribution is a thorough analysis of the parameters used in the employed dimensionality reduction techniques. Finally, we provide an experimental evaluation that shows how the proposed methods outperform the standard techniques, both in terms of recognition and generation of motion patterns.

  • 41. Romero, Javier
    et al.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Non-parametric hand pose estimation with object context2013In: Image and Vision Computing, ISSN 0262-8856, E-ISSN 1872-8138, Vol. 31, no 8, p. 555-564Article in journal (Refereed)
    Abstract [en]

    In the spirit of recent work on contextual recognition and estimation, we present a method for estimating the pose of human hands, employing information about the shape of the object in the hand. Despite the fact that most applications of human hand tracking involve grasping and manipulation of objects, the majority of methods in the literature assume a free hand, isolated from the surrounding environment. Occlusion of the hand from grasped objects does in fact often pose a severe challenge to the estimation of hand pose. In the presented method, object occlusion is not only compensated for, it contributes to the pose estimation in a contextual fashion; this without an explicit model of object shape. Our hand tracking method is non-parametric, performing a nearest neighbor search in a large database (.. entries) of hand poses with and without grasped objects. The system that operates in real time, is robust to self occlusions, object occlusions and segmentation errors, and provides full hand pose reconstruction from monocular video. Temporal consistency in hand pose is taken into account, without explicitly tracking the hand in the high-dim pose space. Experiments show the non-parametric method to outperform other state of the art regression methods, while operating at a significantly lower computational cost than comparable model-based hand tracking methods.

  • 42. Salzmann, Mathieu
    et al.
    Ek, Carl Henrik
    Urtasun, Raquel
    Darrell, Trevor
    Factorized Orthogonal Latent Spaces2010In: Proceedings of the 13th International Conferenceon Artificial Intelligence and Statistics (AISTATS) 2010, 2010Conference paper (Refereed)
  • 43. Salzmann, Mathieu
    et al.
    Ek, Carl Henrik
    Urtasun, Raquel
    Darrell, Trevor
    FOLS: Factorized Orthogonal Latent Spaces2010Conference paper (Refereed)
  • 44.
    Sharif Razavian, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Azizpour, Hossein
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Persistent Evidence of Local Image Properties in Generic ConvNets2015In: Image Analysis: 19th Scandinavian Conference, SCIA 2015, Copenhagen, Denmark, June 15-17, 2015. Proceedings / [ed] Paulsen, Rasmus R., Pedersen, Kim S., Springer Publishing Company, 2015, p. 249-262Conference paper (Refereed)
    Abstract [en]

    Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or thevariation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. Surprisingly, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer,i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks,and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes.

  • 45.
    Song, Dan
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Huebner, Kai
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Embodiment-Specific Representation of Robot Grasping using Graphical Models and Latent-Space Discretization2011In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2011, p. 980-986Conference paper (Refereed)
    Abstract [en]

    We study embodiment-specific robot grasping tasks, represented in a probabilistic framework. The framework consists of a Bayesian network (BN) integrated with a novel multi-variate discretization model. The BN models the probabilistic relationships among tasks, objects, grasping actions and constraints. The discretization model provides compact data representation that allows efficient learning of the conditional structures in the BN. To evaluate the framework, we use a database generated in a simulated environment including examples of a human and a robot hand interacting with objects. The results show that the different kinematic structures of the hands affect both the BN structure and the conditional distributions over the modeled variables. Both models achieve accurate task classification, and successfully encode the semantic task requirements in the continuous observation spaces. In an imitation experiment, we demonstrate that the representation framework can transfer task knowledge between different embodiments, therefore is a suitable model for grasp planning and imitation in a goal-directed manner.

  • 46.
    Song, Dan
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Hübner, Kai
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Task-Based Robot Grasp Planning Using Probabilistic Inference2015In: IEEE Transactions on robotics, ISSN 1552-3098, E-ISSN 1941-0468, Vol. 31, no 3, p. 546-561Article in journal (Refereed)
    Abstract [en]

    Grasping and manipulating everyday objects in a goal-directed manner is an important ability of a service robot. The robot needs to reason about task requirements and ground these in the sensorimotor information. Grasping and interaction with objects are challenging in real-world scenarios, where sensorimotor uncertainty is prevalent. This paper presents a probabilistic framework for the representation and modeling of robot-grasping tasks. The framework consists of Gaussian mixture models for generic data discretization, and discrete Bayesian networks for encoding the probabilistic relations among various task-relevant variables, including object and action features as well as task constraints. We evaluate the framework using a grasp database generated in a simulated environment including a human and two robot hand models. The generative modeling approach allows the prediction of grasping tasks given uncertain sensory data, as well as object and grasp selection in a task-oriented manner. Furthermore, the graphical model framework provides insights into dependencies between variables and features relevant for object grasping.

  • 47.
    Song, Dan
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic Jensfelt, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Multivariate Discretization for Bayesian Network Structure Learning in Robot Grasping2011In: IEEE International Conference on Robotics and Automation (ICRA), 2011, IEEE conference proceedings, 2011, p. 1944-1950Conference paper (Refereed)
    Abstract [en]

    A major challenge in modeling with BNs is learning the structure from both discrete and multivariate continuous data. A common approach in such situations is to discretize continuous data before structure learning. However efficient methods to discretize high-dimensional variables are largely lacking. This paper presents a novel method specifically aiming at discretization of high-dimensional, high-correlated data. The method consists of two integrated steps: non-linear dimensionality reduction using sparse Gaussian process latent variable models, and discretization by application of a mixture model. The model is fully probabilistic and capable to facilitate structure learning from discretized data, while at the same time retain the continuous representation. We evaluate the effectiveness of the method in the domain of robot grasping. Compared with traditional discretization schemes, our model excels both in task classification and prediction of hand grasp configurations. Further, being a fully probabilistic model it handles uncertainty in the data and can easily be integrated into other frameworks in a principled manner.

  • 48.
    Stork, Johanes Andreas
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Bekiroglu, Yasemin
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Learning Predictive State Representation for in-hand manipulation2015In: Proceedings - IEEE International Conference on Robotics and Automation, IEEE conference proceedings, 2015, no June, p. 3207-3214Conference paper (Refereed)
    Abstract [en]

    We study the use of Predictive State Representation (PSR) for modeling of an in-hand manipulation task through interaction with the environment. We extend the original PSR model to a new domain of in-hand manipulation and address the problem of partial observability by introducing new kernel-based features that integrate both actions and observations. The model is learned directly from haptic data and is used to plan series of actions that rotate the object in the hand to a specific configuration by pushing it against a table. Further, we analyze the model's belief states using additional visual data and enable planning of action sequences when the observations are ambiguous. We show that the learned representation is geometrically meaningful by embedding labeled action-observation traces. Suitability for planning is demonstrated by a post-grasp manipulation example that changes the object state to multiple specified target configurations.

  • 49.
    Stork, Johannes A.
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Learning Predictive State Representations for Planning2015In: 2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), IEEE Press, 2015, p. 3427-3434Conference paper (Refereed)
    Abstract [en]

    Predictive State Representations (PSRs) allow modeling of dynamical systems directly in observables and without relying on latent variable representations. A problem that arises from learning PSRs is that it is often hard to attribute semantic meaning to the learned representation. This makes generalization and planning in PSRs challenging. In this paper, we extend PSRs and introduce the notion of PSRs that include prior information (P-PSRs) to learn representations which are suitable for planning and interpretation. By learning a low-dimensional embedding of test features we map belief points of similar semantic to the same region of a subspace. This facilitates better generalization for planning and semantical interpretation of the learned representation. In specific, we show how to overcome the training sample bias and introduce feature selection such that the resulting representation emphasizes observables related to the planning task. We show that our P-PSRs result in qualitatively meaningful representations and present quantitative results that indicate improved suitability for planning.

  • 50.
    Thippur, Akshaya
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Inferring hand pose: A comparative study of visual shape features2013In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013, IEEE , 2013, p. 6553698-Conference paper (Refereed)
    Abstract [en]

    Hand pose estimation from video is essential for a number of applications such as automatic sign language recognition and robot learning from demonstration. However, hand pose estimation is made difficult by the high degree of articulation of the hand; a realistic hand model is described with at least 35 dimensions, which means that it can assume a wide variety of poses, and there is a very high degree of self occlusion for most poses. Furthermore, different parts of the hand display very similar visual appearance; it is difficult to tell fingers apart in video. These properties of hands put hard requirements on visual features used for hand pose estimation and tracking. In this paper, we evaluate three different state-of-the-art visual shape descriptors, which are commonly used for hand and human body pose estimation. We study the nature of the mappings from the hand pose space to the feature spaces spanned by the visual descriptors, in terms of the smoothness, discriminability, and generativity of the pose-feature mappings, as well as their robustness to noise in terms of these properties. Based on this, we give recommendations on in which types of applications each visual shape descriptor is suitable.

12 1 - 50 of 54
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf