kth.sePublications KTH
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
Link to record
Permanent link

Direct link
Publications (8 of 8) Show all publications
Thippur, A., Stork, J. A. & Jensfelt, P. (2017). Non-Parametric Spatial Context Structure Learning for Autonomous Understanding of Human Environments. In: Howard, A Suzuki, K Zollo, L (Ed.), 2017 26TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN): . Paper presented at 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), AUG 28-SEP 01, 2017, Lisbon, PORTUGAL (pp. 1317-1324). IEEE
Open this publication in new window or tab >>Non-Parametric Spatial Context Structure Learning for Autonomous Understanding of Human Environments
2017 (English)In: 2017 26TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN) / [ed] Howard, A Suzuki, K Zollo, L, IEEE , 2017, p. 1317-1324Conference paper, Published paper (Refereed)
Abstract [en]

Autonomous scene understanding by object classification today, crucially depends on the accuracy of appearance based robotic perception. However, this is prone to difficulties in object detection arising from unfavourable lighting conditions and vision unfriendly object properties. In our work, we propose a spatial context based system which infers object classes utilising solely structural information captured from the scenes to aid traditional perception systems. Our system operates on novel spatial features (IFRC) that are robust to noisy object detections; It also caters to on-the-fly learned knowledge modification improving performance with practise. IFRC are aligned with human expression of 3D space, thereby facilitating easy HRI and hence simpler supervised learning. We tested our spatial context based system to successfully conclude that it can capture spatio structural information to do joint object classification to not only act as a vision aide, but sometimes even perform on par with appearance based robotic vision.

Place, publisher, year, edition, pages
IEEE, 2017
Series
IEEE RO-MAN, ISSN 1944-9445
Keywords
structure learning, spatial relationships, lazy learners, autonomous scene understanding
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-225236 (URN)10.1109/ROMAN.2017.8172475 (DOI)000427262400205 ()2-s2.0-85045741190 (Scopus ID)978-1-5386-3518-6 (ISBN)
Conference
26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), AUG 28-SEP 01, 2017, Lisbon, PORTUGAL
Funder
EU, FP7, Seventh Framework Programme, 600623Swedish Research Council, C0475401
Note

QC 20180403

Available from: 2018-04-03 Created: 2018-04-03 Last updated: 2022-06-26Bibliographically approved
Thippur, A., Burbridge, C., Kunze, L., Alberti, M., Folkesson, J., Jensfelt, P. & Hawes, N. (2015). A comparison of qualitative and metric spatial relation models for scene understanding. In: Proceedings of the National Conference on Artificial Intelligence: . Paper presented at 29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015; Austin; United States (pp. 1632-1640). AI Access Foundation, 2
Open this publication in new window or tab >>A comparison of qualitative and metric spatial relation models for scene understanding
Show others...
2015 (English)In: Proceedings of the National Conference on Artificial Intelligence, AI Access Foundation , 2015, Vol. 2, p. 1632-1640Conference paper, Published paper (Refereed)
Abstract [en]

Object recognition systems can be unreliable when run in isolation depending on only image based features, but their performance can be improved when taking scene context into account. In this paper, we present techniques to model and infer object labels in real scenes based on a variety of spatial relations - geometric features which capture how objects co-occur - and compare their efficacy in the context of augmenting perception based object classification in real-world table-top scenes. We utilise a long-term dataset of office table-tops for qualitatively comparing the performances of these techniques. On this dataset, we show that more intricate techniques, have a superior performance but do not generalise well on small training data. We also show that techniques using coarser information perform crudely but sufficiently well in standalone scenarios and generalise well on small training data. We conclude the paper, expanding on the insights we have gained through these comparisons and comment on a few fundamental topics with respect to long-term autonomous robots.

Place, publisher, year, edition, pages
AI Access Foundation, 2015
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-187387 (URN)000485625501093 ()2-s2.0-84959902245 (Scopus ID)9781577357001 (ISBN)
Conference
29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015; Austin; United States
Note

QC 20160527

Available from: 2016-05-27 Created: 2016-05-23 Last updated: 2025-02-07Bibliographically approved
Ekekrantz, J., Thippur, A., Folkesson, J. & Jensfelt, P. (2015). Probabilistic Primitive Refinement algorithm for colored point cloud data. In: 2015 European Conference on Mobile Robots (ECMR): . Paper presented at 2015 European Conference on Mobile Robots (ECMR). Lincoln: IEEE conference proceedings
Open this publication in new window or tab >>Probabilistic Primitive Refinement algorithm for colored point cloud data
2015 (English)In: 2015 European Conference on Mobile Robots (ECMR), Lincoln: IEEE conference proceedings, 2015Conference paper, Published paper (Refereed)
Abstract [en]

In this work we present the Probabilistic Primitive Refinement (PPR) algorithm, an iterative method for accurately determining the inliers of an estimated primitive (such as planes and spheres) parametrization in an unorganized, noisy point cloud. The measurement noise of the points belonging to the proposed primitive surface are modelled using a Gaussian distribution and the measurements of extraneous points to the proposed surface are modelled as a histogram. Given these models, the probability that a measurement originated from the proposed surface model can be computed. Our novel technique to model the noisy surface from the measurement data does not require a priori given parameters for the sensor noise model. The absence of sensitive parameters selection is a strength of our method. Using the geometric information obtained from such an estimate the algorithm then builds a color-based model for the surface, further boosting the accuracy of the segmentation. If used iteratively the PPR algorithm can be seen as a variation of the popular mean-shift algorithm with an adaptive stochastic kernel function.

Place, publisher, year, edition, pages
Lincoln: IEEE conference proceedings, 2015
Keywords
Registration, poiint cloud, robotics
National Category
Robotics and automation
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-178942 (URN)10.1109/ECMR.2015.7324199 (DOI)000380213600033 ()2-s2.0-84962333311 (Scopus ID)
External cooperation:
Conference
2015 European Conference on Mobile Robots (ECMR)
Projects
STRANDS
Funder
EU, FP7, Seventh Framework Programme, 600623
Note

QC 20160114

Available from: 2015-12-09 Created: 2015-12-09 Last updated: 2025-02-09Bibliographically approved
Kunze, L., Burbridge, C., Alberti, M., Thippur, A., Folkesson, J., Jensfelt, P. & Hawes, N. (2014). Combining Top-down Spatial Reasoning and Bottom-up Object Class Recognition for Scene Understanding. In: Proc. of 2014 IEEE/RSJ International Conference on IntelligentRobots and Systems 2014: . Paper presented at IEEE/RSJ International Conference on Intelligent Robots and Systems, 14-18 Sept. 2014, Chicago, IL, USA (pp. 2910-2915). IEEE conference proceedings
Open this publication in new window or tab >>Combining Top-down Spatial Reasoning and Bottom-up Object Class Recognition for Scene Understanding
Show others...
2014 (English)In: Proc. of 2014 IEEE/RSJ International Conference on IntelligentRobots and Systems 2014, IEEE conference proceedings, 2014, p. 2910-2915Conference paper, Published paper (Refereed)
Abstract [en]

Many robot perception systems are built to only consider intrinsic object features to recognise the class of an object. By integrating both top-down spatial relational reasoning and bottom-up object class recognition the overall performance of a perception system can be improved. In this paper we present a unified framework that combines a 3D object class recognition system with learned, spatial models of object relations. In robot experiments we show that our combined approach improves the classification results on real world office desks compared to pure bottom-up perception. Hence, by using spatial knowledge during object class recognition perception becomes more efficient and robust and robots can understand scenes more effectively.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2014
Keywords
Spatial Relations, Robotics, Learning
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-156599 (URN)10.1109/IROS.2014.6942963 (DOI)2-s2.0-84911478657 (Scopus ID)
Conference
IEEE/RSJ International Conference on Intelligent Robots and Systems, 14-18 Sept. 2014, Chicago, IL, USA
Projects
StrandsEuropean Union Seventh Framework Programme (FP7/2007-2013) under grant agreement No 600623
Funder
EU, FP7, Seventh Framework Programme
Note

QC 20141205

Available from: 2014-12-01 Created: 2014-12-01 Last updated: 2025-02-09Bibliographically approved
Thippur, A., Ambrus, R., Agrawal, G., Del Burgo, A. G., Ramesh, J. H., Jha, M. K., . . . Jensfelt, P. (2014). KTH-3D-TOTAL: A 3D dataset for discovering spatial structures for long-term autonomous learning. In: 2014 13th International Conference on Control Automation Robotics and Vision, ICARCV 2014: . Paper presented at 2014 13th International Conference on Control Automation Robotics and Vision, ICARCV 2014, Singapore, Singapore, 10 December 2014 through 12 December 2014 (pp. 1528-1535). IEEE
Open this publication in new window or tab >>KTH-3D-TOTAL: A 3D dataset for discovering spatial structures for long-term autonomous learning
Show others...
2014 (English)In: 2014 13th International Conference on Control Automation Robotics and Vision, ICARCV 2014, IEEE , 2014, p. 1528-1535Conference paper, Published paper (Refereed)
Abstract [en]

Long-term autonomous learning of human environments entails modelling and generalizing over distinct variations in: object instances in different scenes, and different scenes with respect to space and time. It is crucial for the robot to recognize the structure and context in spatial arrangements and exploit these to learn models which capture the essence of these distinct variations. Table-tops posses a typical structure repeatedly seen in human environments and are identified by characteristics of being personal spaces of diverse functionalities and dynamically changing due to human interactions. In this paper, we present a 3D dataset of 20 office table-tops manually observed and scanned 3 times a day as regularly as possible over 19 days (461 scenes) and subsequently, manually annotated with 18 different object classes, including multiple instances. We analyse the dataset to discover spatial structures and patterns in their variations. The dataset can, for example, be used to study the spatial relations between objects and long-term environment models for applications such as activity recognition, context and functionality estimation and anomaly detection.

Place, publisher, year, edition, pages
IEEE, 2014
Keywords
Robotics, Activity recognition, Autonomous learning, Environment models, Human interactions, Multiple instances, Spatial arrangements, Spatial structure, Typical structures, Computer vision
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-166173 (URN)10.1109/ICARCV.2014.7064543 (DOI)000393395800265 ()2-s2.0-84988288796 (Scopus ID)9781479951994 (ISBN)
Conference
2014 13th International Conference on Control Automation Robotics and Vision, ICARCV 2014, Singapore, Singapore, 10 December 2014 through 12 December 2014
Note

QC 20150504

Available from: 2015-05-04 Created: 2015-05-04 Last updated: 2025-02-09Bibliographically approved
Maas, R., Thippur, A., Sehr, A. & Kellermann, W. (2013). An uncertainty decoding approach to noise- and reverberation-robust speech recognition. In: ICASSP IEEE Int Conf Acoust Speech Signal Process Proc: . Paper presented at 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, 26 May 2013 through 31 May 2013, Vancouver, BC (pp. 7388-7392).
Open this publication in new window or tab >>An uncertainty decoding approach to noise- and reverberation-robust speech recognition
2013 (English)In: ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, 2013, p. 7388-7392Conference paper, Published paper (Refereed)
Abstract [en]

The generic REMOS (REverberation MOdeling for robust Speech recognition) concept is extended in this contribution to cope with additional noise components. REMOS originally embeds an explicit reverberation model into a hiddenMarkov model (HMM) leading to a relaxed conditional independence assumption for the observed feature vectors. During recognition, a nonlinear optimization problem is to be solved in order to adapt the HMMs' output probability density functions to the current reverberation conditions. The extension for additional noise components necessitates a modified numerical solver for the nonlinear optimization problem. We propose an approximation scheme based on continuous piecewise linear regression. Connected-digit recognition experiments demonstrate the potential of REMOS in reverberant and noisy environments. They furthermore reveal that the benefit of an explicit reverberation model, overcoming the conditional independence assumption, increases with increasing signal-to-noise-ratios.

Series
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, ISSN 1520-6149
Keywords
automatic speech recognition, noise robustness, piecewise linear regression, reverberation robustness, uncertainty decoding
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-140040 (URN)10.1109/ICASSP.2013.6639098 (DOI)000329611507111 ()2-s2.0-84890473474 (Scopus ID)9781479903566 (ISBN)
Conference
2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, 26 May 2013 through 31 May 2013, Vancouver, BC
Note

QC 20140121

Available from: 2014-01-21 Created: 2014-01-16 Last updated: 2024-03-18Bibliographically approved
Thippur, A., Ek, C. H. & Kjellström, H. (2013). Inferring hand pose: A comparative study of visual shape features. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013: . Paper presented at 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013; Shanghai; China; 22 April 2013 through 26 April 2013 (pp. 6553698). IEEE
Open this publication in new window or tab >>Inferring hand pose: A comparative study of visual shape features
2013 (English)In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013, IEEE , 2013, p. 6553698-Conference paper, Published paper (Refereed)
Abstract [en]

Hand pose estimation from video is essential for a number of applications such as automatic sign language recognition and robot learning from demonstration. However, hand pose estimation is made difficult by the high degree of articulation of the hand; a realistic hand model is described with at least 35 dimensions, which means that it can assume a wide variety of poses, and there is a very high degree of self occlusion for most poses. Furthermore, different parts of the hand display very similar visual appearance; it is difficult to tell fingers apart in video. These properties of hands put hard requirements on visual features used for hand pose estimation and tracking. In this paper, we evaluate three different state-of-the-art visual shape descriptors, which are commonly used for hand and human body pose estimation. We study the nature of the mappings from the hand pose space to the feature spaces spanned by the visual descriptors, in terms of the smoothness, discriminability, and generativity of the pose-feature mappings, as well as their robustness to noise in terms of these properties. Based on this, we give recommendations on in which types of applications each visual shape descriptor is suitable.

Place, publisher, year, edition, pages
IEEE, 2013
Keywords
Comparative studies, Discriminability, Hand pose estimations, Robot learning from demonstrations, Robustness to noise, Shape descriptors, Sign Language recognition, Visual appearance, Gesture recognition
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-133862 (URN)10.1109/FG.2013.6553698 (DOI)000395532800003 ()2-s2.0-84881537213 (Scopus ID)9781467355452 (ISBN)
Conference
2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013; Shanghai; China; 22 April 2013 through 26 April 2013
Note

QC 20131112

Available from: 2013-11-12 Created: 2013-11-11 Last updated: 2025-02-07Bibliographically approved
Thippur, A., Askenfelt, A. & Kjellström, H. (2013). Probabilistic modeling of bowing gestures for gesture-based violin sound synthesis. In: Roberto Bresin and Anders Askenfeldt (Ed.), Proceedings of the Stockholm Music Acoustics Conference 2013, SMAC 13: . Paper presented at Stockholm Music Acoustics Conference 2013 (SMAC 13), 30/7-3/8 2013 (pp. 133-139). KTH Royal Institute of Technology
Open this publication in new window or tab >>Probabilistic modeling of bowing gestures for gesture-based violin sound synthesis
2013 (English)In: Proceedings of the Stockholm Music Acoustics Conference 2013, SMAC 13 / [ed] Roberto Bresin and Anders Askenfeldt, KTH Royal Institute of Technology, 2013, p. 133-139Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2013
National Category
Computer Sciences Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-137416 (URN)978-3-8325-3473-8 (ISBN)
Conference
Stockholm Music Acoustics Conference 2013 (SMAC 13), 30/7-3/8 2013
Note

QC 20150216

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2025-02-01Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-0448-3786

Search in DiVA

Show all publications