Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 78) Show all publications
Klasson, M., Zhang, C. & Kjellström, H. (2019). A hierarchical grocery store image dataset with visual and semantic labels. In: Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019: . Paper presented at 19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019, 7 January 2019 through 11 January 2019 (pp. 491-500). Institute of Electrical and Electronics Engineers (IEEE), Article ID 8658240.
Open this publication in new window or tab >>A hierarchical grocery store image dataset with visual and semantic labels
2019 (English)In: Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 491-500, article id 8658240Conference paper, Published paper (Refereed)
Abstract [en]

Image classification models built into visual support systems and other assistive devices need to provide accurate predictions about their environment. We focus on an application of assistive technology for people with visual impairments, for daily activities such as shopping or cooking. In this paper, we provide a new benchmark dataset for a challenging task in this application – classification of fruits, vegetables, and refrigerated products, e.g. milk packages and juice cartons, in grocery stores. To enable the learning process to utilize multiple sources of structured information, this dataset not only contains a large volume of natural images but also includes the corresponding information of the product from an online shopping website. Such information encompasses the hierarchical structure of the object classes, as well as an iconic image of each type of object. This dataset can be used to train and evaluate image classification models for helping visually impaired people in natural environments. Additionally, we provide benchmark results evaluated on pretrained convolutional neural networks often used for image understanding purposes, and also a multi-view variational autoencoder, which is capable of utilizing the rich product information in the dataset.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Keywords
Benchmarking, Computer vision, Electronic commerce, Image classification, Large dataset, Learning systems, Neural networks, Semantics, Accurate prediction, Assistive technology, Classification models, Convolutional neural network, Hierarchical structures, Natural environments, Structured information, Visually impaired people, Classification (of information)
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-252223 (URN)10.1109/WACV.2019.00058 (DOI)000469423400051 ()2-s2.0-85063566822 (Scopus ID)9781728119755 (ISBN)
Conference
19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019, 7 January 2019 through 11 January 2019
Note

QC 20190611

Available from: 2019-06-11 Created: 2019-06-11 Last updated: 2019-06-26Bibliographically approved
Zhang, C., Butepage, J., Kjellström, H. & Mandt, S. (2019). Advances in Variational Inference. IEEE Transaction on Pattern Analysis and Machine Intelligence, 41(8), 2008-2026
Open this publication in new window or tab >>Advances in Variational Inference
2019 (English)In: IEEE Transaction on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 41, no 8, p. 2008-2026Article in journal (Refereed) Published
Abstract [en]

Many modern unsupervised or semi-supervised machine learning algorithms rely on Bayesian probabilistic models. These models are usually intractable and thus require approximate inference. Variational inference (VI) lets us approximate a high-dimensional Bayesian posterior with a simpler variational distribution by solving an optimization problem. This approach has been successfully applied to various models and large-scale applications. In this review, we give an overview of recent trends in variational inference. We first introduce standard mean field variational inference, then review recent advances focusing on the following aspects: (a) scalable VI, which includes stochastic approximations, (b) generic VI, which extends the applicability of VI to a large class of otherwise intractable models, such as non-conjugate models, mean field approximation or with atypical divergences, and (d) amortized VI, which implements the inference over local latent variables with inference networks. Finally, we provide a summary of promising future research directions.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2019
Keywords
Variational inference, approximate Bayesian inference, reparameterization gradients, structured variational approximations, scalable inference, inference networks
National Category
Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-255405 (URN)10.1109/TPAMI.2018.2889774 (DOI)000473598800016 ()30596568 (PubMedID)2-s2.0-85059288228 (Scopus ID)
Note

QC 20190814

Available from: 2019-08-14 Created: 2019-08-14 Last updated: 2019-08-14Bibliographically approved
Kucherenko, T., Hasegawa, D., Henter, G. E., Kaneko, N. & Kjellström, H. (2019). Analyzing Input and Output Representations for Speech-Driven Gesture Generation. In: 19th ACM International Conference on Intelligent Virtual Agents: . Paper presented at 19th ACM International Conference on Intelligent Virtual Agents (IVA '19),July 2-5,2019,Paris, France. New York, NY, USA: ACM Publications
Open this publication in new window or tab >>Analyzing Input and Output Representations for Speech-Driven Gesture Generation
Show others...
2019 (English)In: 19th ACM International Conference on Intelligent Virtual Agents, New York, NY, USA: ACM Publications, 2019Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a novel framework for automatic speech-driven gesture generation, applicable to human-agent interaction including both virtual agents and robots. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates.

Our approach consists of two steps. First, we learn a lower-dimensional representation of human motion using a denoising autoencoder neural network, consisting of a motion encoder MotionE and a motion decoder MotionD. The learned representation preserves the most important aspects of the human pose variation while removing less relevant variation. Second, we train a novel encoder network SpeechE to map from speech to a corresponding motion representation with reduced dimensionality. At test time, the speech encoder and the motion decoder networks are combined: SpeechE predicts motion representations based on a given speech signal and MotionD then decodes these representations to produce motion sequences.

We evaluate different representation sizes in order to find the most effective dimensionality for the representation. We also evaluate the effects of using different speech features as input to the model. We find that mel-frequency cepstral coefficients (MFCCs), alone or combined with prosodic features, perform the best. The results of a subsequent user study confirm the benefits of the representation learning.

Place, publisher, year, edition, pages
New York, NY, USA: ACM Publications, 2019
Keywords
Gesture generation, social robotics, representation learning, neural network, deep learning, gesture synthesis, virtual agents
National Category
Human Computer Interaction
Research subject
Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-255035 (URN)10.1145/3308532.3329472 (DOI)978-1-4503-6672-4 (ISBN)
Conference
19th ACM International Conference on Intelligent Virtual Agents (IVA '19),July 2-5,2019,Paris, France
Projects
EACare
Funder
Swedish Foundation for Strategic Research , RIT15-0107
Available from: 2019-07-16 Created: 2019-07-16 Last updated: 2019-07-22
Kucherenko, T., Hasegawa, D., Naoshi, K., Henter, G. E. & Kjellström, H. (2019). On the Importance of Representations for Speech-Driven Gesture Generation: Extended Abstract. In: : . Paper presented at International Conference on Autonomous Agents and Multiagent Systems (AAMAS '19), May 13-17, 2019, Montréal, Canada (pp. 2072-2074). The International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Open this publication in new window or tab >>On the Importance of Representations for Speech-Driven Gesture Generation: Extended Abstract
Show others...
2019 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a novel framework for automatic speech-driven gesture generation applicable to human-agent interaction, including both virtual agents and robots. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech features as input and produces gestures in the form of sequences of 3D joint coordinates representing motion as output. The results of objective and subjective evaluations confirm the benefits of the representation learning.

Place, publisher, year, edition, pages
The International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2019
Keywords
Gesture generation; social robotics; representation learning; neural network; deep learning; virtual agents
National Category
Human Computer Interaction
Research subject
Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-251648 (URN)
Conference
International Conference on Autonomous Agents and Multiagent Systems (AAMAS '19), May 13-17, 2019, Montréal, Canada
Projects
EACare
Funder
Swedish Foundation for Strategic Research , RIT15-0107
Note

QC 20190515

Available from: 2019-05-16 Created: 2019-05-16 Last updated: 2019-05-22Bibliographically approved
Wolfert, P., Kucherenko, T., Kjellström, H. & Belpaeme, T. (2019). Should Beat Gestures Be Learned Or Designed?: A Benchmarking User Study. In: ICDL-EPIROB 2019: Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions. Paper presented at ICDL-EPIROB 2019 Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions. IEEE conference proceedings
Open this publication in new window or tab >>Should Beat Gestures Be Learned Or Designed?: A Benchmarking User Study
2019 (English)In: ICDL-EPIROB 2019: Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions, IEEE conference proceedings, 2019Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we present a user study on gener-ated beat gestures for humanoid agents. It has been shownthat Human-Robot Interaction can be improved by includingcommunicative non-verbal behavior, such as arm gestures. Beatgestures are one of the four types of arm gestures, and are knownto be used for emphasizing parts of speech. In our user study,we compare beat gestures learned from training data with hand-crafted beat gestures. The first kind of gestures are generatedby a machine learning model trained on speech audio andhuman upper body poses. We compared this approach with threehand-coded beat gestures methods: designed beat gestures, timedbeat gestures, and noisy gestures. Forty-one subjects participatedin our user study, and a ranking was derived from pairedcomparisons using the Bradley Terry Luce model. We found thatfor beat gestures, the gestures from the machine learning modelare preferred, followed by algorithmically generated gestures.This emphasizes the promise of machine learning for generating communicative actions.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2019
Keywords
gesture generation, machine learning, beat gestures, user study, virtual agents
National Category
Human Computer Interaction
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-255998 (URN)
Conference
ICDL-EPIROB 2019 Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions
Note

QC 20190815

Available from: 2019-08-14 Created: 2019-08-14 Last updated: 2019-08-15Bibliographically approved
Butepage, J., Kjellström, H. & Kragic, D. (2018). Anticipating many futures: Online human motion prediction and generation for human-robot interaction. In: 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA): . Paper presented at IEEE International Conference on Robotics and Automation (ICRA), MAY 21-25, 2018, Brisbane, AUSTRALIA (pp. 4563-4570). IEEE COMPUTER SOC
Open this publication in new window or tab >>Anticipating many futures: Online human motion prediction and generation for human-robot interaction
2018 (English)In: 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), IEEE COMPUTER SOC , 2018, p. 4563-4570Conference paper, Published paper (Refereed)
Abstract [en]

Fluent and safe interactions of humans and robots require both partners to anticipate the others' actions. The bottleneck of most methods is the lack of an accurate model of natural human motion. In this work, we present a conditional variational autoencoder that is trained to predict a window of future human motion given a window of past frames. Using skeletal data obtained from RGB depth images, we show how this unsupervised approach can be used for online motion prediction for up to 1660 ms. Additionally, we demonstrate online target prediction within the first 300-500 ms after motion onset without the use of target specific training data. The advantage of our probabilistic approach is the possibility to draw samples of possible future motion patterns. Finally, we investigate how movements and kinematic cues are represented on the learned low dimensional manifold.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2018
Series
IEEE International Conference on Robotics and Automation ICRA, ISSN 1050-4729
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-237164 (URN)000446394503071 ()978-1-5386-3081-5 (ISBN)
Conference
IEEE International Conference on Robotics and Automation (ICRA), MAY 21-25, 2018, Brisbane, AUSTRALIA
Funder
Swedish Foundation for Strategic Research
Note

QC 20181024

Available from: 2018-10-24 Created: 2018-10-24 Last updated: 2019-08-20Bibliographically approved
Mikheeva, O., Ek, C. H. & Kjellström, H. (2018). Perceptual facial expression representation. In: Proceedings - 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018: . Paper presented at 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018, Grand Dynasty Culture HotelXi'an, China, 15 May 2018 through 19 May 2018 (pp. 179-186). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Perceptual facial expression representation
2018 (English)In: Proceedings - 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 179-186Conference paper, Published paper (Refereed)
Abstract [en]

Dissimilarity measures are often used as a proxy or a handle to reason about data. This can be problematic, as the data representation is often a consequence of the capturing process or how the data is visualized, rather than a reflection of the semantics that we want to extract. Facial expressions are a subtle and essential part of human communication but they are challenging to extract from current representations. In this paper we present a method that is capable of learning semantic representations of faces in a data driven manner. Our approach uses sparse human supervision which our method grounds in the data. We provide experimental justification of our approach showing that our representation improves the performance for emotion classification.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018
Keywords
Facial expressions, Representation learning, Variational auto encoder
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-238209 (URN)10.1109/FG.2018.00035 (DOI)2-s2.0-85049386490 (Scopus ID)9781538623350 (ISBN)
Conference
13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018, Grand Dynasty Culture HotelXi'an, China, 15 May 2018 through 19 May 2018
Note

QC 20181122

Available from: 2018-11-22 Created: 2018-11-22 Last updated: 2018-11-22Bibliographically approved
Hamesse, C., Ackermann, P., Kjellström, H. & Zhang, C. (2018). Simultaneous measurement imputation and outcome prediction for achilles tendon rupture rehabilitation. In: CEUR Workshop Proceedings: . Paper presented at 1st Joint Workshop on AI in Health, AIH 2018, Stockholm, Sweden, 13 July 2018 through 14 July 2018 (pp. 82-86). CEUR-WS, 2142
Open this publication in new window or tab >>Simultaneous measurement imputation and outcome prediction for achilles tendon rupture rehabilitation
2018 (English)In: CEUR Workshop Proceedings, CEUR-WS , 2018, Vol. 2142, p. 82-86Conference paper, Published paper (Refereed)
Abstract [en]

Achilles Tendon Rupture (ATR) is one of the typical soft tissue injuries. Accurately predicting the rehabilitation outcome of ATR using noisy measurements with missing entries is crucial for treatment decision support. In this work, we design a probabilistic model that simultaneously predicts the missing measurements and the rehabilitation outcome in an end-to-end manner. We evaluate our model and compare it with multiple baselines including multi-stage methods using an ATR clinical cohort. Experimental results demonstrate the superiority of our model for ATR rehabilitation outcome prediction.

Place, publisher, year, edition, pages
CEUR-WS, 2018
Series
CEUR Workshop Proceedings, ISSN 1613-0073 ; 2142
National Category
Other Medical Engineering
Identifiers
urn:nbn:se:kth:diva-238396 (URN)2-s2.0-85050917241 (Scopus ID)
Conference
1st Joint Workshop on AI in Health, AIH 2018, Stockholm, Sweden, 13 July 2018 through 14 July 2018
Note

QC 20181108

Available from: 2018-11-08 Created: 2018-11-08 Last updated: 2018-11-08Bibliographically approved
Karipidou, K., Ahnlund, J., Friberg, A., Alexanderson, S. & Kjellström, H. (2017). Computer Analysis of Sentiment Interpretation in Musical Conducting. In: Proceedings - 12th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2017: . Paper presented at 12th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2017, Washington, United States, 30 May 2017 through 3 June 2017 (pp. 400-405). IEEE, Article ID 7961769.
Open this publication in new window or tab >>Computer Analysis of Sentiment Interpretation in Musical Conducting
Show others...
2017 (English)In: Proceedings - 12th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2017, IEEE, 2017, p. 400-405, article id 7961769Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a unique dataset consisting of 20 recordings of the same musical piece, conducted with 4 different musical intentions in mind. The upper body and baton motion of a professional conductor was recorded, as well as the sound of each instrument in a professional string quartet following the conductor. The dataset is made available for benchmarking of motion recognition algorithms. An HMM-based emotion intent classification method is trained with subsets of the data, and classification of other subsets of the data show firstly that the motion of the baton communicates energetic intention to a high degree, secondly, that the conductor’s torso, head and other arm conveys calm intention to a high degree, and thirdly, that positive vs negative sentiments are communicated to a high degree through other channels than the body and baton motion – most probably, through facial expression and muscle tension conveyed through articulated hand and finger motion. The long-term goal of this work is to develop a computer model of the entire conductor-orchestra communication pro- cess; the studies presented here indicate that computer modeling of the conductor-orchestra communication is feasible.

Place, publisher, year, edition, pages
IEEE, 2017
National Category
Computer and Information Sciences
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-208886 (URN)10.1109/FG.2017.57 (DOI)000414287400054 ()2-s2.0-85026288976 (Scopus ID)9781509040230 (ISBN)
Conference
12th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2017, Washington, United States, 30 May 2017 through 3 June 2017
Note

QC 20170616

Available from: 2017-06-12 Created: 2017-06-12 Last updated: 2018-09-13Bibliographically approved
Zhang, C., Kjellström, H. & Mandt, S. (2017). Determinantal point processes for mini-batch diversification. In: Uncertainty in Artificial Intelligence - Proceedings of the 33rd Conference, UAI 2017: . Paper presented at 33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia, 11 August 2017 through 15 August 2017. AUAI Press Corvallis
Open this publication in new window or tab >>Determinantal point processes for mini-batch diversification
2017 (English)In: Uncertainty in Artificial Intelligence - Proceedings of the 33rd Conference, UAI 2017, AUAI Press Corvallis , 2017Conference paper (Refereed)
Abstract [en]

We study a mini-batch diversification scheme for stochastic gradient descent (SGD). While classical SGD relies on uniformly sampling data points to form a mini-batch, we propose a non-uniform sampling scheme based on the Determinantal Point Process (DPP). The DPP relies on a similarity measure between data points and gives low probabilities to mini-batches which contain redundant data, and higher probabilities to mini-batches with more diverse data. This simultaneously balances the data and leads to stochastic gradients with lower variance. We term this approach Diversified Mini-Batch SGD (DM-SGD). We show that regular SGD and a biased version of stratified sampling emerge as special cases. Furthermore, DM-SGD generalizes stratified sampling to cases where no discrete features exist to bin the data into groups. We show experimentally that our method results more interpretable and diverse features in unsupervised setups, and in better classification accuracies in supervised setups.

Place, publisher, year, edition, pages
AUAI Press Corvallis, 2017
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:kth:diva-218565 (URN)2-s2.0-85031095282 (Scopus ID)
Conference
33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia, 11 August 2017 through 15 August 2017
Note

QC 20171129

Available from: 2017-11-29 Created: 2017-11-29 Last updated: 2017-11-29Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-5750-9655

Search in DiVA

Show all publications