Change search
Refine search result
12 1 - 50 of 74
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1. Ahlberg, Simon
    et al.
    Horling, Pontus
    Johansson, Katarina
    Jored, Karsten
    Kjellström, Hedvig
    Martenson, Christian
    Neider, Gbran
    Schubert, Johan
    Svenson, Pontus
    Svensson, Per
    Walter, Johan
    An information fusion demonstrator for tactical intelligence processing in network-based defense2007In: Information Fusion, ISSN 1566-2535, E-ISSN 1872-6305, Vol. 8, no 1, 84-107 p.Article in journal (Refereed)
    Abstract [en]

    The Swedish Defence Research Agency (FOI) has developed a concept demonstrator called the Information Fusion Demonstrator 2003 (IFD03) for demonstrating information fusion methodology suitable for a future Network Based Defense (NBD) C4ISR system. The focus of the demonstrator is on real-time tactical intelligence processing at the division level in a ground warfare scenario. The demonstrator integrates novel force aggregation, particle filtering, and sensor allocation methods to create, dynamically update, and maintain components of a tactical situation picture. This is achieved by fusing physically modelled and numerically simulated sensor reports from several different sensor types with realistic a priori information sampled from both a high-resolution terrain model and an enemy organizational and behavioral model. This represents a key step toward the goal of creating in real time a dynamic, high fidelity representation of a moving battalion-sized organization, based on sensor data as well as a priori intelligence and terrain information, employing fusion, tracking, aggregation, and resource allocation methods all built on well-founded theories of uncertainty. The motives behind this project, the fusion methods developed for the system, as well as its scenario model and simulator architecture are described. The main services of the demonstrator are discussed and early experience from using the system is shared.

  • 2. Ahlberg, Simon
    et al.
    Hörling, Pontus
    Jöred, Karsten
    Lindström, Björn
    Mårtenson, Christian
    Neider, Göran
    Schubert, Johan
    Sidenbladh, Hedvig
    Svenson, Pontus
    Svensson, Per
    Unden, Katarina
    Walter, Johan
    The IFD03 information fusion demonstrator2004In: Proceedings of the Seventh International Conference on Information Fusion, FUSION 2004, 2004, 936-943 p.Conference paper (Refereed)
    Abstract [en]

    The paper discusses a recently developed demonstrator system where new ideas in tactical information fusion may be tested and demonstrated. The main services of the demonstrator are discussed, and essential experience from the use and development of the system is shared.

  • 3. Bray, Matthieu
    et al.
    Sidenbladh, Hedvig
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Eklundh, Jan-Olof
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Recognition of gestures in the context of speech2002In: 16th International Conference on Pattern Recognition, 2002. Proceedings., 2002Conference paper (Refereed)
    Abstract [en]

    The scope of this paper is the interpretation of a user's intention via a video camera and a speech recognizer In comparison to previous work which only takes into account gesture recognition, we demonstrate that by including speech, system comprehension increases. For the gesture recognition, the user must wear a colored glove, then we extract the velocity of the center of gravity of the hand. A Hidden Markov Model (HMM) is learned for each gesture that we want to recognize. In a dynamic action, to know if a gesture has been performed or not, we implement a threshold model below which the gesture is not detected. The off line tests for gesture recognition have a success rate exceeding 85% for each gesture. The combination of speech and gestures is realized using Bayesian theory.

  • 4.
    Bälter, Olle
    et al.
    KTH, School of Computer Science and Communication (CSC), Human - Computer Interaction, MDI.
    Engwall, Olov
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Öster, Anne-Marie
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Wizard-of-Oz Test of ARTUR - a Computer-Based Speech Training System with Articulation Correction2005In: proceedings of ASSETS 2005, 2005, 36-43 p.Conference paper (Refereed)
    Abstract [en]

    This study has been performed in order to test the human-machine interface of a computer-based speech training aid named ARTUR with the main feature that it can give suggestions on how to improve articulation. Two user groups were involved: three children aged 9-14 with extensive experience of speech training, and three children aged 6. All children had general language disorders. The study indicates that the present interface is usable without prior training or instructions, even for the younger children, although it needs some improvement to fit illiterate children. The granularity of the mesh that classifies mispronunciations was satisfactory, but can be developed further.

  • 5.
    Caccamo, Sergio
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Güler, Püren
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Active perception and modeling of deformable surfaces using Gaussian processes and position-based dynamics2016In: IEEE-RAS International Conference on Humanoid Robots, IEEE, 2016, 530-537 p.Conference paper (Refereed)
    Abstract [en]

    Exploring and modeling heterogeneous elastic surfaces requires multiple interactions with the environment and a complex selection of physical material parameters. The most common approaches model deformable properties from sets of offline observations using computationally expensive force-based simulators. In this work we present an online probabilistic framework for autonomous estimation of a deformability distribution map of heterogeneous elastic surfaces from few physical interactions. The method takes advantage of Gaussian Processes for constructing a model of the environment geometry surrounding a robot. A fast Position-based Dynamics simulator uses focused environmental observations in order to model the elastic behavior of portions of the environment. Gaussian Process Regression maps the local deformability on the whole environment in order to generate a deformability distribution map. We show experimental results using a PrimeSense camera, a Kinova Jaco2 robotic arm and an Optoforce sensor on different deformable surfaces.

  • 6. Do, Martin
    et al.
    Romero, Javier
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Azad, Pedram
    Asfour, Tamim
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Dillman, Rüdiger
    Grasp recognition and mapping on humanoid robots2009In: 9th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS09, 2009, 465-471 p.Conference paper (Refereed)
  • 7.
    Engwall, Olov
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Bälter, Olle
    KTH, School of Computer Science and Communication (CSC), Human - Computer Interaction, MDI.
    Öster, Anne-Marie
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Designing the user interface of the computer-based speech training system ARTUR based on early user tests2006In: Behavior and Information Technology, ISSN 0144-929X, E-ISSN 1362-3001, Vol. 25, no 4, 353-365 p.Article in journal (Refereed)
    Abstract [en]

    This study has been performed in order to evaluate a prototype for the human - computer interface of a computer-based speech training aid named ARTUR. The main feature of the aid is that it can give suggestions on how to improve articulations. Two user groups were involved: three children aged 9 - 14 with extensive experience of speech training with therapists and computers, and three children aged 6, with little or no prior experience of computer-based speech training. All children had general language disorders. The study indicates that the present interface is usable without prior training or instructions, even for the younger children, but that more motivational factors should be introduced. The granularity of the mesh that classifies mispronunciations was satisfactory, but the flexibility and level of detail of the feedback should be developed further.

  • 8.
    Engwall, Olov
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Bälter, Olle
    KTH, School of Computer Science and Communication (CSC), Human - Computer Interaction, MDI.
    Öster, Anne-Marie
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Feedback management in the pronunciation training system ARTUR2006In: Proceedings of CHI 2006, 2006, 231-234 p.Conference paper (Refereed)
    Abstract [en]

    This extended abstract discusses the development of a computer-assisted pronunciation training system that gives articulatory feedback, and in particular the management of feedback given to the user.

  • 9.
    Eriksson, André
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    A formal approach to anomaly detection2016In: ICPRAM 2016 - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods, SciTePress, 2016, 317-326 p.Conference paper (Refereed)
    Abstract [en]

    While many advances towards effective anomaly detection techniques targeting specific applications have been made in recent years, little work has been done to develop application-agnostic approaches to the subject. In this article, we present such an approach, in which anomaly detection methods are treated as formal, structured objects. We consider a general class of methods, with an emphasis on methods that utilize structural properties of the data they operate on. For this class of methods, we develop a decomposition into sub-methods-simple, restricted objects, which may be reasoned about independently and combined to form methods. As we show, this formalism enables the construction of software that facilitates formulating, implementing, evaluating, as well as algorithmically finding and calibrating anomaly detection methods.

  • 10.
    Eriksson, Elina
    et al.
    KTH, School of Computer Science and Communication (CSC), Human - Computer Interaction, MDI.
    Bälter, Olle
    KTH, School of Computer Science and Communication (CSC), Human - Computer Interaction, MDI.
    Engwall, Olov
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Öster, Anne-Marie
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Design Recommendations for a Computer-Based Speech Training System Based on End User Interviews2005In: Proceedings of the Tenth International Conference on Speech and Computers, 2005, 483-486 p.Conference paper (Refereed)
    Abstract [en]

    This study has been performed in order to improve theusability of computer-based speech training (CBST) aids.The aim was to engage the users of speech training systemsin the first step of creating a new CBST aid. Speechtherapists and children with hearing- or speech impairmentwere interviewed and the result of the interviews ispresented in the form of design recommendations.

  • 11.
    Geronimo, David
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Unsupervised surveillance video retrieval based on human action and appearance2014In: Proceedings - International Conference on Pattern Recognition, LOS ALAMITOS: IEEE Computer Society, 2014, 4630-4635 p.Conference paper (Refereed)
    Abstract [en]

    Forensic video analysis is the offline analysis of video aimed at understanding what happened in a scene in the past. Two of its key tasks are the recognition of specific actions, e.g., walking or fighting, and the search for specific persons, also referred to as re-identification. Although these tasks have traditionally been performed manually in forensic investigations, the current growing number of cameras and recorded video leads to the need for automated analysis. In this paper we propose an unsupervised retrieval system for surveillance videos based on human action and appearance. Given a query window, the system retrieves people performing the same action as the one in the query, the same person performing any action, or the same person performing the same action. We use an adaptive search algorithm that focuses the analysis on relevant frames based on the inter-frame difference of foreground masks. Then, for each analyzed frame, a pedestrian detector is used to extract windows containing each pedestrian in the scene. For each detection, we use optical flow features to represent its action and color features to represent its appearance. These extracted features are used to compute the probability that the detection matches the query according to the specified criterion. The algorithm is fully unsupervised, i.e., no training or constraints on the appearance, actions or number of actions that will appear in the test video are made. The proposed algorithm is tested on a surveillance video with different people performing different actions, providing satisfactory retrieval performance.

  • 12. Grahn, Josef
    et al.
    Kjellström, Hedvig
    FOI, Stockholm.
    Using SVM for efficient detection of human motion2005In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, VS-PETS, 2005, 231-238 p.Conference paper (Refereed)
  • 13.
    Güler, Rezan
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS. KTH, School of Biotechnology (BIO), Protein Technology.
    Pauwels, Karl
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Pieropan, Alessandro
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Estimating the Deformability of Elastic Materials using Optical Flow and Position-based Dynamics2015In: Humanoid Robots (Humanoids), 2015 IEEE-RAS 15th International Conference on, IEEE conference proceedings, 2015, 965-971 p.Conference paper (Refereed)
    Abstract [en]

    Knowledge of the physical properties of objects is essential in a wide range of robotic manipulation scenarios. A robot may not always be aware of such properties prior to interaction. If an object is incorrectly assumed to be rigid, it may exhibit unpredictable behavior when grasped. In this paper, we use vision based observation of the behavior of an object a robot is interacting with and use it as the basis for estimation of its elastic deformability. This is estimated in a local region around the interaction point using a physics simulator. We use optical flow to estimate the parameters of a position-based dynamics simulation using meshless shape matching (MSM). MSM has been widely used in computer graphics due to its computational efficiency, which is also important for closed-loop control in robotics. In a controlled experiment we demonstrate that our method can qualitatively estimate the physical properties of objects with different degrees of deformability.

  • 14.
    Hjelm, Martin
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Detry, Renaud
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Sparse Summarization of Robotic Grasping Data2013In: 2013 IEEE International Conference on Robotics and Automation (ICRA), New York: IEEE , 2013, 1082-1087 p.Conference paper (Refereed)
    Abstract [en]

    We propose a new approach for learning a summarized representation of high dimensional continuous data. Our technique consists of a Bayesian non-parametric model capable of encoding high-dimensional data from complex distributions using a sparse summarization. Specifically, the method marries techniques from probabilistic dimensionality reduction and clustering. We apply the model to learn efficient representations of grasping data for two robotic scenarios.

  • 15.
    Karipidou, Kelly
    et al.
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Ahnlund, Josefin
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Friberg, Anders
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Alexanderson, Simon
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Computer Analysis of Sentiment Interpretation in Musical Conducting2017In: Proceedings - 12th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2017, IEEE, 2017, 400-405 p., 7961769Conference paper (Refereed)
    Abstract [en]

    This paper presents a unique dataset consisting of 20 recordings of the same musical piece, conducted with 4 different musical intentions in mind. The upper body and baton motion of a professional conductor was recorded, as well as the sound of each instrument in a professional string quartet following the conductor. The dataset is made available for benchmarking of motion recognition algorithms. An HMM-based emotion intent classification method is trained with subsets of the data, and classification of other subsets of the data show firstly that the motion of the baton communicates energetic intention to a high degree, secondly, that the conductor’s torso, head and other arm conveys calm intention to a high degree, and thirdly, that positive vs negative sentiments are communicated to a high degree through other channels than the body and baton motion – most probably, through facial expression and muscle tension conveyed through articulated hand and finger motion. The long-term goal of this work is to develop a computer model of the entire conductor-orchestra communication pro- cess; the studies presented here indicate that computer modeling of the conductor-orchestra communication is feasible.

  • 16.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Contextual Action Recognition2011In: Visual Analysis of Humans : Looking at People / [ed] T. B. Moeslund, A. Hilton, V. Krüger and L. Sigal, Springer , 2011, 355-376 p.Chapter in book (Other academic)
  • 17.
    Kjellström, Hedvig
    FOI, Stockholm.
    Datorer som ser människor2007In: Sinnen, signaler och tolkningar av verkligheten / [ed] Lindberg, Bo, Göteborg: Kungliga vetenskaps- och vitterhetssamhället , 2007Chapter in book (Other (popular science, discussion, etc.))
  • 18.
    Kjellström, Hedvig
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Engwall, Olov
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Audiovisual-to-articulatory inversion2009In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 51, no 3, 195-209 p.Article in journal (Refereed)
    Abstract [en]

    It has been shown that acoustic-to-articulatory inversion, i.e. estimation of the articulatory configuration from the corresponding acoustic signal, can be greatly improved by adding visual features extracted from the speaker's face. In order to make the inversion method usable in a realistic application, these features should be possible to obtain from a monocular frontal face video, where the speaker is not required to wear any special markers. In this study, we investigate the importance of visual cues for inversion. Experiments with motion capture data of the face show that important articulatory information can be extracted using only a few face measures that mimic the information that could be gained from a video-based method. We also show that the depth cue for these measures is not critical, which means that the relevant information can be extracted from a frontal video. A real video-based face feature extraction method is further presented, leading to similar improvements in inversion quality. Rather than tracking points on the face, it represents the appearance of the mouth area using independent component images. These findings are important for applications that need a simple audiovisual-to-articulatory inversion technique, e.g. articulatory phonetics training for second language learners or hearing-impaired persons.

  • 19.
    Kjellström, Hedvig
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Engwall, Olov
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT. KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Abdou, Sherif
    Bälter, Olle
    KTH, School of Computer Science and Communication (CSC), Human - Computer Interaction, MDI.
    Audio-visual phoneme classification for pronunciation training applications2007In: INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007, 57-60 p.Conference paper (Refereed)
    Abstract [en]

    We present a method for audio-visual classification of Swedish phonemes, to be used in computer-assisted pronunciation training. The probabilistic kernel-based method is applied to the audio signal and/or either a principal or an independent component (PCA or ICA) representation of the mouth region in video images. We investigate which representation (PCA or ICA) that may be most suitable and the number of components required in the base, in order to be able to automatically detect pronunciation errors in Swedish from audio-visual input. Experiments performed on one speaker show that the visual information help avoiding classification errors that would lead to gravely erroneous feedback to the user; that it is better to perform phoneme classification on audio and video separately and then fuse the results, rather than combining them before classification; and that PCA outperforms ICA for fewer than 50 components.

  • 20.
    Kjellström, Hedvig
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Engwall, Olov
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT. KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Bälter, Olle
    KTH, School of Computer Science and Communication (CSC), Human - Computer Interaction, MDI.
    Reconstructing Tongue Movements from Audio and Video2006In: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, Vol. 1-5, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, 2238-2241 p.Conference paper (Refereed)
    Abstract [en]

    This paper presents an approach to articulatory inversion using audio and video of the user's face, requiring no special markers. The video is stabilized with respect to the face, and the mouth region cropped out. The mouth image is projected into a learned independent component subspace to obtain a low-dimensional representation of the mouth appearance. The inversion problem is treated as one of regression; a non-linear regressor using relevance vector machines is trained with a dataset of simultaneous images of a subject's face, acoustic features and positions of magnetic coils glued to the subjects's tongue. The results show the benefit of using both cues for inversion. We envisage the inversion method to be part of a pronunciation training system with articulatory feedback.

  • 21.
    Kjellström, Hedvig
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Black, Michael J.
    Tracking People Interacting with Objects2010In: 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, 747-754 p.Conference paper (Refereed)
    Abstract [en]

    While the problem of tracking 3D human motion has been widely studied, most approaches have assumed that the person is isolated and not interacting with the environment. Environmental constraints, however, can greatly constrain and simplify the tracking problem. The most studied constraints involve gravity and contact with the ground plane. We go further to consider interaction with objects in the environment. In many cases, tracking rigid environmental objects is simpler than tracking high-dimensional human motion. When a human is in contact with objects in the world, their poses constrain the pose of body, essentially removing degrees of freedom. Thus what would appear to be a harder problem, combining object and human tracking, is actually simpler. We use a standard formulation of the body tracking problem but add an explicit model of contact with objects. We find that constraints from the world make it possible to track complex articulated human motion in 3D from a monocular camera.

  • 22.
    Kjellström, Hedvig
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Romero, Javier
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Visual object-action recognition: Inferring object affordances from human demonstration2011In: Computer Vision and Image Understanding, ISSN 1077-3142, E-ISSN 1090-235X, Vol. 115, no 1, 81-90 p.Article in journal (Refereed)
    Abstract [en]

    This paper investigates object categorization according to function, i.e., learning the affordances of objects from human demonstration. Object affordances (functionality) are inferred from observations of humans using the objects in different types of actions. The intended application is learning from demonstration, in which a robot learns to employ objects in household tasks, from observing a human performing the same tasks with the objects. We present a method for categorizing manipulated objects and human manipulation actions in context of each other. The method is able to simultaneously segment and classify human hand actions, and detect and classify the objects involved in the action. This can serve as an initial step in a learning from demonstration method. Experiments show that the contextual information improves the classification of both objects and actions.

  • 23.
    Kjellström, Hedvig
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Romero, Javier
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Visual Recognition of Grasps for Human-to-Robot Mapping2008In: 2008 IEEE/RSJ International Conference On Robots And Intelligent Systems, Vols 1-3, Conference Proceedings / [ed] Chatila, R; Kelly, A; Merlet, JP, 2008, 3192-3199 p.Conference paper (Refereed)
    Abstract [en]

    This paper presents a vision based method for grasp classification. It is developed as part of a Programming by Demonstration (PbD) system for which recognition of objects and pick-and-place actions represent basic building blocks for task learning. In contrary to earlier approaches, no articulated 3D reconstruction of the hand over time is taking place. The indata consists of a single image of the human hand. A 2D representation of the hand shape, based on gradient orientation histograms, is extracted from the image. The hand shape is then classified as one of six grasps by finding similar hand shapes in a large database of grasp images. The database search is performed using Locality Sensitive Hashing (LSH), an approximate k-nearest neighbor approach. The nearest neighbors also give an estimated hand orientation with respect to the camera. The six human grasps are mapped to three Barret hand grasps. Depending on the type of robot grasp, a precomputed grasp strategy is selected. The strategy is further parameterized by the orientation of the hand relative to the object. To evaluate the potential for the method to be part of a robust vision system, experiments were performed, comparing classification results to a baseline of human classification performance. The experiments showed the LSH recognition performance to be comparable to human performance.

  • 24.
    Kjellström, Hedvig
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Romero, Javier
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Martinez, David
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects2008In: Computer Vision - Eccv 2008, Pt Ii, Proceedings / [ed] Forsyth, D; Torr, P; Zisserman, A, 2008, Vol. 5303, 336-349 p.Conference paper (Refereed)
    Abstract [en]

    The visual analysis of human manipulation actions is of interest for e.g. human-robot interaction applications where a robot learns how to perform a task by watching a human. In this paper, a method for classifying manipulation actions in the context of the objects manipulated, and classifying objects in the context of the actions used to manipulate them is presented. Hand and object features are extracted from the video sequence using a segmentation based approach. A shape based representation is used for both the hand and the object. Experiments show this representation suitable for representing generic shape classes. The action-object correlation over time is then modeled using conditional random fields. Experimental comparison show great improvement in classification rate when the action-object correlation is taken into account, compared to separate classification of manipulation actions and manipulated objects.

  • 25. Ormoneit, D.
    et al.
    Black, M. J.
    Hastie, T.
    Kjellström, Hedvig
    FOI, Stockholm.
    Representing cyclic human motion using functional analysis2005In: Image and Vision Computing, ISSN 0262-8856, E-ISSN 1872-8138, Vol. 23, no 14, 1264-1276 p.Article in journal (Refereed)
    Abstract [en]

    We present a robust automatic method for modeling cyclic 3D human motion such as walking using motion-capture data. The pose of the body is represented by a time-series of joint angles which are automatically segmented into a sequence of motion cycles. The mean and the principal components of these cycles are computed using a new algorithm that enforces smooth transitions between the cycles by operating in the Fourier domain. Key to this method is its ability to automatically deal with noise and missing data. A learned walking model is then exploited for Bayesian tracking of 3D human motion.

  • 26. Ormoneit, Dirk
    et al.
    Sidenbladh, Hedvig
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Black, Michael J.
    Stochastic modeling and tracking of human motion2000In: Learning 2000, 2000Conference paper (Refereed)
  • 27. Ormoneit, Dirk
    et al.
    Sidenbladh, Hedvig
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Black, Michael J.
    Hastie, Trevor
    Learning and tracking cyclic human motion2001In: Advances in Neural Information Processing Systems 13, 2001Conference paper (Refereed)
  • 28. Ormoneit, Dirk
    et al.
    Sidenbladh, Hedvig
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Black, Michael J.
    Hastie, Trevor
    Fleet, David J.
    Learning and tracking human motion using functional analysis2000In: IEEE Workshop on Human Modeling, Analysis and Synthesis, 2000Conference paper (Refereed)
  • 29.
    Pieropan, Alessandro
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Bergstroem, Niklas
    Ishikawa, Masatoshi
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Robust and adaptive keypoint-based object tracking2016In: Advanced Robotics, ISSN 0169-1864, E-ISSN 1568-5535, Vol. 30, no 4, 258-269 p.Article in journal (Refereed)
    Abstract [en]

    Object tracking is a fundamental ability for a robot; manipulation as well as activity recognition relies on the robot being able to follow objects in the scene. This paper presents a tracker that adapts to changes in object appearance and is able to re-discover an object that was lost. At its core is a keypoint-based method that exploits the rigidity assumption: pairs of keypoints maintain the same relations over similarity transforms. Using a structured approach to learning, it is able to incorporate new appearances in its model for increased robustness. We show through quantitative and qualitative experiments the benefits of the proposed approach compared to the state of the art, even for objects that do not strictly follow the rigidity assumption.

  • 30.
    Pieropan, Alessandro
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS. KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Bergström, N.
    Ishikawa, M.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS. KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Robust tracking of unknown objects through adaptive size estimation and appearance learning2016In: Proceedings - IEEE International Conference on Robotics and Automation, IEEE conference proceedings, 2016, 559-566 p.Conference paper (Refereed)
    Abstract [en]

    This work employs an adaptive learning mechanism to perform tracking of an unknown object through RGBD cameras. We extend our previous framework to robustly track a wider range of arbitrarily shaped objects by adapting the model to the measured object size. The size is estimated as the object undergoes motion, which is done by fitting an inscribed cuboid to the measurements. The region spanned by this cuboid is used during tracking, to determine whether or not new measurements should be added to the object model. In our experiments we test our tracker with a set of objects of arbitrary shape and we show the benefit of the proposed model due to its ability to adapt to the object shape which leads to more robust tracking results.

  • 31.
    Pieropan, Alessandro
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Bergström, Niklas
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS. KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ishikawa, Masatoshi
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Robust 3D tracking of unknown objects2015In: Proceedings - IEEE International Conference on Robotics and Automation, IEEE conference proceedings, 2015, no June, 2410-2417 p.Conference paper (Refereed)
    Abstract [en]

    Visual tracking of unknown objects is an essential task in robotic perception, of importance to a wide range of applications. In the general scenario, the robot has no full 3D model of the object beforehand, just the partial view of the object visible in the first video frame. A tracker with this information only will inevitably lose track of the object after occlusions or large out-of-plane rotations. The way to overcome this is to incrementally learn the appearances of new views of the object. However, this bootstrapping approach is sensitive to drifting due to occasional inclusion of the background into the model. In this paper we propose a method that exploits 3D point coherence between views to overcome the risk of learning the background, by only learning the appearances at the faces of an inscribed cuboid. This is closely related to the popular idea of 2D object tracking using bounding boxes, with the additional benefit of recovering the full 3D pose of the object as well as learning its full appearance from all viewpoints. We show quantitatively that the use of an inscribed cuboid to guide the learning leads to significantly more robust tracking than with other state-of-the-art methods. We show that our tracker is able to cope with 360 degree out-of-plane rotation, large occlusion and fast motion.

  • 32.
    Pieropan, Alessandro
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Bergström, Niklas
    Ishikawa, Masatoshi
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Robust 3D tracking of unknown objectsManuscript (preprint) (Other academic)
  • 33.
    Pieropan, Alessandro
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Bergström, Niklas
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ishikawa, Masatoshi
    Robust Tracking through Learning2014In: 32nd Annual Conference of the Robotics Society of Japan, 2014, 2014Conference paper (Refereed)
  • 34.
    Pieropan, Alessandro
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Functional Object Descriptors for Human Activity Modeling2013In: 2013 IEEE International Conference on Robotics and Automation (ICRA), IEEE conference proceedings, 2013, 1282-1289 p.Conference paper (Refereed)
    Abstract [en]

    The ability to learn from human demonstration is essential for robots in human environments. The activity models that the robot builds from observation must take both the human motion and the objects involved into account. Object models designed for this purpose should reflect the role of the object in the activity - its function, or affordances. The main contribution of this paper is to represent object directly in terms of their interaction with human hands, rather than in terms of appearance. This enables the direct representation of object affordances/function, while being robust to intra-class differences in appearance. Object hypotheses are first extracted from a video sequence as tracks of associated image segments. The object hypotheses are encoded as strings, where the vocabulary corresponds to different types of interaction with human hands. The similarity between two such object descriptors can be measured using a string kernel. Experiments show these functional descriptors to capture differences and similarities in object affordances/function that are not represented by appearance.

  • 35.
    Pieropan, Alessandro
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Recognizing Object Affordances in Terms of Spatio-Temporal Object-Object Relationships2014In: Humanoid Robots (Humanoids), 2014 14th IEEE-RAS International Conference on, IEEE conference proceedings, 2014, 52-58 p.Conference paper (Refereed)
    Abstract [en]

    In this paper we describe a probabilistic framework that models the interaction between multiple objects in a scene.We present a spatio-temporal feature encoding pairwise interactions between each object in the scene. By the use of a kernel representation we embed object interactions in a vector space which allows us to define a metric comparing interactions of different temporal extent. Using this metric we define a probabilistic model which allows us to represent and extract the affordances of individual objects based on the structure of their interaction. In this paper we focus on the presented pairwise relationships but the model can naturally be extended to incorporate additional cues related to a single object or multiple objects. We compare our approach with traditional kernel approaches and show a significant improvement.

  • 36.
    Pieropan, Alessandro
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Unsupervised object exploration using context2014In: The 23rd IEEE International Symposium on Robot and Human Interactive Communication, 2014 RO-MAN, IEEE conference proceedings, 2014, -506 p.Conference paper (Refereed)
    Abstract [en]

    In order for robots to function in unstructured environments in interaction with humans, they must be able to reason about the world in a semantic meaningful way. An essential capability is to segment the world into semantic plausible object hypotheses. In this paper we propose a general framework which can be used for reasoning about objects and their functionality in manipulation activities. Our system employs a hierarchical segmentation framework that extracts object hypotheses from RGB-D video. Motivated by cognitive studies on humans, our work leverages on contextual information, e.g., that objects obey the laws of physics, to formulate object hypotheses from regions in a mathematically principled manner.

  • 37.
    Pieropan, Alessandro
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Salvi, Giampiero
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Pauwels, Karl
    Universidad de Granada, Spain.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Audio-Visual Classification and Detection of Human Manipulation Actions2014In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), IEEE conference proceedings, 2014, 3045-3052 p.Conference paper (Refereed)
    Abstract [en]

    Humans are able to merge information from multiple perceptional modalities and formulate a coherent representation of the world. Our thesis is that robots need to do the same in order to operate robustly and autonomously in an unstructured environment. It has also been shown in several fields that multiple sources of information can complement each other, overcoming the limitations of a single perceptual modality. Hence, in this paper we introduce a data set of actions that includes both visual data (RGB-D video and 6DOF object pose estimation) and acoustic data. We also propose a method for recognizing and segmenting actions from continuous audio-visual data. The proposed method is employed for extensive evaluation of the descriptive power of the two modalities, and we discuss how they can be used jointly to infer a coherent interpretation of the recorded action.

  • 38.
    Pokorny, Florian T.
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Persistent Homology for Learning Densities with Bounded Support2012In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012 / [ed] P. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou and K.Q. Weinberger, Curran Associates, Inc., 2012, 1817-1825 p.Conference paper (Refereed)
    Abstract [en]

    We present a novel method for learning densities with bounded support which enables us to incorporate 'hard' topological constraints. In particular, we show how emerging techniques from computational algebraic topology and the notion of persistent homology can be combined with kernel-based methods from machine learning for the purpose of density estimation. The proposed formalism facilitates learning of models with bounded support in a principled way, and - by incorporating persistent homology techniques in our approach - we are able to encode algebraic-topological constraints which are not addressed in current state of the art probabilistic models. We study the behaviour of our method on two synthetic examples for various sample sizes and exemplify the benefits of the proposed approach on a real-world dataset by learning a motion model for a race car. We show how to learn a model which respects the underlying topological structure of the racetrack, constraining the trajectories of the car.

  • 39.
    Pokorny, Florian T.
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Topological Constraints and Kernel-Based Density Estimation2012Conference paper (Refereed)
    Abstract [en]

    This extended abstract1 explores the question of how to estimate a probability distribution from a finite number of samples when information about the topology of the support region of an underlying density is known. This workshop contribution is a continuation of our recent work [1] combining persistent homology and kernel-based density estimation for the first time and in which we explored an approach capable of incorporating topological constraints in bandwidth selection. We report on some recent experiments with high-dimensional motion capture data which show that our method is applicable even in high dimensions and develop our ideas for potential future applications of this framework.

  • 40. Qu, An
    et al.
    Zhang, Cheng
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ackermann, Paul
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Bridging Medical Data Inference to Achilles Tendon Rupture Rehabilitation2016Conference paper (Refereed)
  • 41. Romero, Javier
    et al.
    Feix, Thomas
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Extracting Postural Synergies for Robotic Grasping2013In: IEEE Transactions on robotics, ISSN 1552-3098, E-ISSN 1941-0468, Vol. 29, no 6, 1342-1352 p.Article in journal (Refereed)
    Abstract [en]

    We address the problem of representing and encoding human hand motion data using nonlinear dimensionality reduction methods. We build our work on the notion of postural synergies being typically based on a linear embedding of the data. In addition to addressing the encoding of postural synergies using nonlinear methods, we relate our work to control strategies of combined reaching and grasping movements. We show the drawbacks of the (commonly made) causality assumption and propose methods that model the data as being generated from an inferred latent manifold to cope with the problem. Another important contribution is a thorough analysis of the parameters used in the employed dimensionality reduction techniques. Finally, we provide an experimental evaluation that shows how the proposed methods outperform the standard techniques, both in terms of recognition and generation of motion patterns.

  • 42.
    Romero, Javier
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Feix, Thomas
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Spatio-Temporal Modeling of Grasping Actions2010In: IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, 2103-2108 p.Conference paper (Refereed)
    Abstract [en]

    Understanding the spatial dimensionality and temporal context of human hand actions can provide representations for programming grasping actions in robots and inspire design of new robotic and prosthetic hands. The natural representation of human hand motion has high dimensionality. For specific activities such as handling and grasping of objects, the commonly observed hand motions lie on a lower-dimensional non-linear manifold in hand posture space. Although full body human motion is well studied within Computer Vision and Biomechanics, there is very little work on the analysis of hand motion with nonlinear dimensionality reduction techniques. In this paper we use Gaussian Process Latent Variable Models (GPLVMs) to model the lower dimensional manifold of human hand motions during object grasping. We show how the technique can be used to embed high-dimensional grasping actions in a lower-dimensional space suitable for modeling, recognition and mapping.

  • 43. Romero, Javier
    et al.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Non-parametric hand pose estimation with object context2013In: Image and Vision Computing, ISSN 0262-8856, E-ISSN 1872-8138, Vol. 31, no 8, 555-564 p.Article in journal (Refereed)
    Abstract [en]

    In the spirit of recent work on contextual recognition and estimation, we present a method for estimating the pose of human hands, employing information about the shape of the object in the hand. Despite the fact that most applications of human hand tracking involve grasping and manipulation of objects, the majority of methods in the literature assume a free hand, isolated from the surrounding environment. Occlusion of the hand from grasped objects does in fact often pose a severe challenge to the estimation of hand pose. In the presented method, object occlusion is not only compensated for, it contributes to the pose estimation in a contextual fashion; this without an explicit model of object shape. Our hand tracking method is non-parametric, performing a nearest neighbor search in a large database (.. entries) of hand poses with and without grasped objects. The system that operates in real time, is robust to self occlusions, object occlusions and segmentation errors, and provides full hand pose reconstruction from monocular video. Temporal consistency in hand pose is taken into account, without explicitly tracking the hand in the high-dim pose space. Experiments show the non-parametric method to outperform other state of the art regression methods, while operating at a significantly lower computational cost than comparable model-based hand tracking methods.

  • 44.
    Romero, Javier
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS. KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS. KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS. KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Hands in Action: Real-Time 3D Reconstruction of Hands in Interaction with Objects2010In: 2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA)  / [ed] Rakotondrabe M; Ivan IA, 2010, 458-463 p.Conference paper (Refereed)
    Abstract [en]

    This paper presents a method for vision based estimation of the pose of human hands in interaction with objects. Despite the fact that most robotics applications of human hand tracking involve grasping and manipulation of objects, the majority of methods in the literature assume a free hand, isolated from the surrounding environment. Our hand tracking method is non-parametric, performing a nearest neighbor search in a large database (100000 entries) of hand poses with and without grasped objects. The system operates in real time, it is robust to self occlusions, object occlusions and segmentation errors, and provides full hand pose reconstruction from markerless video. Temporal consistency in hand pose is taken into account, without explicitly tracking the hand in the high dimensional pose space.

  • 45.
    Romero, Javier
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Human-to-Robot Mapping of Grasps2008Conference paper (Refereed)
    Abstract [en]

    We are developing a Programming by Demonstration (PbD) system for which recognition of objects and pick-and-place actions represent basic building blocks for task learning. An important capability in this system is automatic isual recognition of human grasps, and methods for mapping the human grasps to the functionally corresponding robot grasps. This paper describes the grasp recognition system, focusing on the human-to-robot mapping. The visual grasp classification and grasp orientation regression is described in our IROS 2008 paper [1]. In contrary to earlier approaches, no articulated 3D reconstruction of the hand over time is taking place. The input data consists of a single image of the human hand. The hand shape is classified as one of six grasps by finding similar hand shapes in a large database of grasp images. From the database, the hand orientation is also estimated. The recognized grasp is then mapped to one of three predefined Barrett hand grasps. Depending on the type of robot grasp, a precomputed grasp strategy is selected. The strategy is further parameterized by the orientation of the hand relative to the environment show purposes.

  • 46.
    Romero, Javier
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Modeling and Evaluation of Human-to-Robot Mapping of Grasps2009In: ICAR: 2009 International Conference on Advanced Robotics, IEEE , 2009, 228-233 p.Conference paper (Refereed)
    Abstract [en]

    We study the problem of human to robot grasp mapping as a basic building block of a learning by imitation system. The human hand posture, including both the grasp type and hand orientation, is first classified based on a single image and mapped to a specific robot hand. A metric for the evaluation based on the notion of virtual fingers is proposed. The first part of the experimental evaluation, performed in simulation, shows bow the differences in the embodiment between human and robotic hand affect the grasp strategy. The second part, performed with a robotic system, demonstrates the feasibility of the proposed methodology in realistic applications.

  • 47.
    Romero, Javier
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Monocular Real-Time 3D Articulated Hand Pose Estimation2009In: 9th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS09, 2009, 87-92 p.Conference paper (Refereed)
    Abstract [en]

    Markerless, vision based estimation of human hand pose over time is a prerequisite for a number of robotics applications, such as Learning by Demonstration (LbD), health monitoring, teleoperation, human-robot interaction. It has special interest in humanoid platforms, where the number of degrees of freedom makes conventional programming challenging. Our primary application is LbD in natural environments where the humanoid robot learns how to grasp and manipulate objects by observing a human performing a task. This paper presents a method for continuous vision based estimation of human hand pose. The method is non-parametric, performing a nearest neighbor search in a large database (100000 entries) of hand pose examples. The main contribution is a real time system, robust to partial occlusions and segmentation errors, that provides full hand pose recognition from markerless data. An additional contribution is the modeling of  based on temporal consistency in hand pose, without explicitly tracking the hand in the high dimensional pose space. The pose representation is rich enough to enable a descriptive humanto-robotmapping. Experiments show the pose estimation to be more robust and accurate than a non-parametric method without temporal constraints.

  • 48. Sanmohan,
    et al.
    Kruger, Volker
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kjellström, Hedvig
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Primitive-Based Action Representation and Recognition2011In: Advanced Robotics, ISSN 0169-1864, E-ISSN 1568-5535, Vol. 25, no 6-7, 871-891 p.Article in journal (Refereed)
    Abstract [en]

    In robotics, there has been a growing interest in expressing actions as a combination of meaningful subparts commonly called motion primitives. Primitives are analogous to words in a language. Similar to words put together according to the rules of language in a sentence, primitives arranged with certain rules make an action. In this paper we investigate modeling and recognition of arm manipulation actions at different levels of complexity using primitives. Primitives are detected automatically in a sequential manner. Here, we assume no prior knowledge on primitives, but look for correlating segments across various sequences. All actions are then modeled within a single hidden Markov models whose structure is learned incrementally as new data is observed. We also generate an action grammar based on these primitives and thus link signals to symbols.

  • 49. Schubert, Johan
    et al.
    Mårtenson, Christian
    Sidenbladh, Hedvig
    Svenson, Pontus
    Walter, Johan
    Methods and system design of the IFD03 information fusion demonstrator2004In: Ninth International Command and Control Research and Technology Symposium, 2004, 1-29 p.Conference paper (Refereed)
  • 50. Schubert, Johan
    et al.
    Sidenbladh, Hedvig
    Sequential clustering with particle filters - estimating the number of clusters from data2005In: 2005 7th International Conference on Information Fusion (FUSION), Vols 1 and 2, 2005, 122-129 p.Conference paper (Refereed)
12 1 - 50 of 74
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf