Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 87) Show all publications
Klasson, M., Zhang, C. & Kjellström, H. (2019). A hierarchical grocery store image dataset with visual and semantic labels. In: Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019: . Paper presented at 19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019, 7 January 2019 through 11 January 2019 (pp. 491-500). Institute of Electrical and Electronics Engineers (IEEE), Article ID 8658240.
Open this publication in new window or tab >>A hierarchical grocery store image dataset with visual and semantic labels
2019 (English)In: Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 491-500, article id 8658240Conference paper, Published paper (Refereed)
Abstract [en]

Image classification models built into visual support systems and other assistive devices need to provide accurate predictions about their environment. We focus on an application of assistive technology for people with visual impairments, for daily activities such as shopping or cooking. In this paper, we provide a new benchmark dataset for a challenging task in this application – classification of fruits, vegetables, and refrigerated products, e.g. milk packages and juice cartons, in grocery stores. To enable the learning process to utilize multiple sources of structured information, this dataset not only contains a large volume of natural images but also includes the corresponding information of the product from an online shopping website. Such information encompasses the hierarchical structure of the object classes, as well as an iconic image of each type of object. This dataset can be used to train and evaluate image classification models for helping visually impaired people in natural environments. Additionally, we provide benchmark results evaluated on pretrained convolutional neural networks often used for image understanding purposes, and also a multi-view variational autoencoder, which is capable of utilizing the rich product information in the dataset.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Keywords
Benchmarking, Computer vision, Electronic commerce, Image classification, Large dataset, Learning systems, Neural networks, Semantics, Accurate prediction, Assistive technology, Classification models, Convolutional neural network, Hierarchical structures, Natural environments, Structured information, Visually impaired people, Classification (of information)
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-252223 (URN)10.1109/WACV.2019.00058 (DOI)000469423400051 ()2-s2.0-85063566822 (Scopus ID)9781728119755 (ISBN)
Conference
19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019, 7 January 2019 through 11 January 2019
Note

QC 20190611

Available from: 2019-06-11 Created: 2019-06-11 Last updated: 2019-06-26Bibliographically approved
Zhang, C., Butepage, J., Kjellström, H. & Mandt, S. (2019). Advances in Variational Inference. IEEE Transaction on Pattern Analysis and Machine Intelligence, 41(8), 2008-2026
Open this publication in new window or tab >>Advances in Variational Inference
2019 (English)In: IEEE Transaction on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 41, no 8, p. 2008-2026Article in journal (Refereed) Published
Abstract [en]

Many modern unsupervised or semi-supervised machine learning algorithms rely on Bayesian probabilistic models. These models are usually intractable and thus require approximate inference. Variational inference (VI) lets us approximate a high-dimensional Bayesian posterior with a simpler variational distribution by solving an optimization problem. This approach has been successfully applied to various models and large-scale applications. In this review, we give an overview of recent trends in variational inference. We first introduce standard mean field variational inference, then review recent advances focusing on the following aspects: (a) scalable VI, which includes stochastic approximations, (b) generic VI, which extends the applicability of VI to a large class of otherwise intractable models, such as non-conjugate models, mean field approximation or with atypical divergences, and (d) amortized VI, which implements the inference over local latent variables with inference networks. Finally, we provide a summary of promising future research directions.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2019
Keywords
Variational inference, approximate Bayesian inference, reparameterization gradients, structured variational approximations, scalable inference, inference networks
National Category
Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-255405 (URN)10.1109/TPAMI.2018.2889774 (DOI)000473598800016 ()30596568 (PubMedID)2-s2.0-85059288228 (Scopus ID)
Note

QC 20190814

Available from: 2019-08-14 Created: 2019-08-14 Last updated: 2019-08-14Bibliographically approved
Kucherenko, T., Hasegawa, D., Henter, G. E., Kaneko, N. & Kjellström, H. (2019). Analyzing Input and Output Representations for Speech-Driven Gesture Generation. In: 19th ACM International Conference on Intelligent Virtual Agents: . Paper presented at 19th ACM International Conference on Intelligent Virtual Agents (IVA '19),July 2-5,2019,Paris, France. New York, NY, USA: ACM Publications
Open this publication in new window or tab >>Analyzing Input and Output Representations for Speech-Driven Gesture Generation
Show others...
2019 (English)In: 19th ACM International Conference on Intelligent Virtual Agents, New York, NY, USA: ACM Publications, 2019Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a novel framework for automatic speech-driven gesture generation, applicable to human-agent interaction including both virtual agents and robots. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates.

Our approach consists of two steps. First, we learn a lower-dimensional representation of human motion using a denoising autoencoder neural network, consisting of a motion encoder MotionE and a motion decoder MotionD. The learned representation preserves the most important aspects of the human pose variation while removing less relevant variation. Second, we train a novel encoder network SpeechE to map from speech to a corresponding motion representation with reduced dimensionality. At test time, the speech encoder and the motion decoder networks are combined: SpeechE predicts motion representations based on a given speech signal and MotionD then decodes these representations to produce motion sequences.

We evaluate different representation sizes in order to find the most effective dimensionality for the representation. We also evaluate the effects of using different speech features as input to the model. We find that mel-frequency cepstral coefficients (MFCCs), alone or combined with prosodic features, perform the best. The results of a subsequent user study confirm the benefits of the representation learning.

Place, publisher, year, edition, pages
New York, NY, USA: ACM Publications, 2019
Keywords
Gesture generation, social robotics, representation learning, neural network, deep learning, gesture synthesis, virtual agents
National Category
Human Computer Interaction
Research subject
Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-255035 (URN)10.1145/3308532.3329472 (DOI)2-s2.0-85069654899 (Scopus ID)978-1-4503-6672-4 (ISBN)
Conference
19th ACM International Conference on Intelligent Virtual Agents (IVA '19),July 2-5,2019,Paris, France
Projects
EACare
Funder
Swedish Foundation for Strategic Research , RIT15-0107
Note

QC 20190902

Available from: 2019-07-16 Created: 2019-07-16 Last updated: 2019-09-02Bibliographically approved
Tu, R., Zhang, C., Ackermann, P., Mohan, K., Kjellström, H. & Zhang, K. (2019). Causal discovery in the presence of missing data. In: : . Paper presented at International Conference on Artificial Intelligence and Statistics.
Open this publication in new window or tab >>Causal discovery in the presence of missing data
Show others...
2019 (English)Conference paper, Published paper (Refereed)
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-269111 (URN)
Conference
International Conference on Artificial Intelligence and Statistics
Available from: 2020-03-04 Created: 2020-03-04 Last updated: 2020-03-04
Tu, R., Zhang, C., Ackermann, P., Mohan, K., Kjellström, H. & Zhang, K. (2019). Causal Discovery in the Presence of Missing Data. In: Chaudhuri, K Sugiyama, M (Ed.), 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89: . Paper presented at 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), APR 16-18, 2019, Naha, JAPAN. MICROTOME PUBLISHING
Open this publication in new window or tab >>Causal Discovery in the Presence of Missing Data
Show others...
2019 (English)In: 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89 / [ed] Chaudhuri, K Sugiyama, M, MICROTOME PUBLISHING , 2019Conference paper, Published paper (Refereed)
Abstract [en]

Missing data are ubiquitous in many domains such as healthcare. When these data entries are not missing completely at random, the (conditional) independence relations in the observed data may be different from those in the complete data generated by the underlying causal process. Consequently, simply applying existing causal discovery methods to the observed data may lead to wrong conclusions. In this paper, we aim at developing a causal discovery method to recover the underlying causal structure from observed data that are missing under different mechanisms, including missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). With missingness mechanisms represented by missingness graphs (m-graphs), we analyze conditions under which additional correction is needed to derive conditional independence/dependence relations in the complete data. Based on our analysis, we propose Missing Value PC (MVPC), which extends the PC algorithm to incorporate additional corrections. Our proposed MVPC is shown in theory to give asymptotically correct results even on data that are MAR or MNAR. Experimental results on both synthetic data and real healthcare applications illustrate that the proposed algorithm is able to find correct causal relations even in the general case of MNAR.

Place, publisher, year, edition, pages
MICROTOME PUBLISHING, 2019
Series
Proceedings of Machine Learning Research, ISSN 2640-3498 ; 89
National Category
Mathematics
Identifiers
urn:nbn:se:kth:diva-269511 (URN)000509687901084 ()
Conference
22nd International Conference on Artificial Intelligence and Statistics (AISTATS), APR 16-18, 2019, Naha, JAPAN
Note

QC 20200309

Available from: 2020-03-09 Created: 2020-03-09 Last updated: 2020-03-09Bibliographically approved
Eriksson, S., Unander-Scharin, Å., Trichon, V., Unander-Scharin, C., Kjellström, H. & Höök, K. (2019). Dancing with Drones: Crafting Novel Artistic Expressions through Intercorporeality. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems: . Paper presented at ACM SIGCHI (pp. 617:1-617:12). New York, NY USA
Open this publication in new window or tab >>Dancing with Drones: Crafting Novel Artistic Expressions through Intercorporeality
Show others...
2019 (English)In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, New York, NY USA, 2019, p. 617:1-617:12Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
New York, NY USA: , 2019
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-257746 (URN)10.1145/3290605.3300847 (DOI)000474467907074 ()2-s2.0-85067597620 (Scopus ID)
Conference
ACM SIGCHI
Projects
KAW 2015.0080, Engineering theInterconnected Society: Information, Control, Interaction
Note

QC 20190916

Available from: 2019-09-03 Created: 2019-09-03 Last updated: 2020-01-22Bibliographically approved
Broomé, S., Bech Gleerup, K., Haubro Andersen, P. & Kjellström, H. (2019). Dynamics are important for the recognition of equine pain in video. In: : . Paper presented at IEEE Conference on Computer Vision and Pattern Recognition.
Open this publication in new window or tab >>Dynamics are important for the recognition of equine pain in video
2019 (English)Conference paper, Published paper (Refereed)
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-269109 (URN)
Conference
IEEE Conference on Computer Vision and Pattern Recognition
Available from: 2020-03-04 Created: 2020-03-04 Last updated: 2020-03-04
Stefanov, K., Salvi, G., Kontogiorgos, D., Kjellström, H. & Beskow, J. (2019). Modeling of Human Visual Attention in Multiparty Open-World Dialogues. ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 8(2), Article ID UNSP 8.
Open this publication in new window or tab >>Modeling of Human Visual Attention in Multiparty Open-World Dialogues
Show others...
2019 (English)In: ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, ISSN 2573-9522, Vol. 8, no 2, article id UNSP 8Article in journal (Refereed) Published
Abstract [en]

This study proposes, develops, and evaluates methods for modeling the eye-gaze direction and head orientation of a person in multiparty open-world dialogues, as a function of low-level communicative signals generated by his/hers interlocutors. These signals include speech activity, eye-gaze direction, and head orientation, all of which can be estimated in real time during the interaction. By utilizing these signals and novel data representations suitable for the task and context, the developed methods can generate plausible candidate gaze targets in real time. The methods are based on Feedforward Neural Networks and Long Short-Term Memory Networks. The proposed methods are developed using several hours of unrestricted interaction data and their performance is compared with a heuristic baseline method. The study offers an extensive evaluation of the proposed methods that investigates the contribution of different predictors to the accurate generation of candidate gaze targets. The results show that the methods can accurately generate candidate gaze targets when the person being modeled is in a listening state. However, when the person being modeled is in a speaking state, the proposed methods yield significantly lower performance.

Place, publisher, year, edition, pages
ASSOC COMPUTING MACHINERY, 2019
Keywords
Human-human interaction, open-world dialogue, eye-gaze direction, head orientation, multiparty
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-255203 (URN)10.1145/3323231 (DOI)000472066800003 ()
Note

QC 20190904

Available from: 2019-09-04 Created: 2019-09-04 Last updated: 2019-10-15Bibliographically approved
Tu, R., Zhang, K., Bertilson, B. C., Kjellström, H. & Zhang, C. (2019). Neuropathic Pain Diagnosis Simulator for Causal Discovery Algorithm Evaluation. In: : . Paper presented at Neural Information Processing Systems.
Open this publication in new window or tab >>Neuropathic Pain Diagnosis Simulator for Causal Discovery Algorithm Evaluation
Show others...
2019 (English)Conference paper, Published paper (Refereed)
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-269108 (URN)
Conference
Neural Information Processing Systems
Available from: 2020-03-04 Created: 2020-03-04 Last updated: 2020-03-04
Kucherenko, T., Hasegawa, D., Naoshi, K., Henter, G. E. & Kjellström, H. (2019). On the Importance of Representations for Speech-Driven Gesture Generation: Extended Abstract. In: : . Paper presented at International Conference on Autonomous Agents and Multiagent Systems (AAMAS '19), May 13-17, 2019, Montréal, Canada (pp. 2072-2074). The International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Open this publication in new window or tab >>On the Importance of Representations for Speech-Driven Gesture Generation: Extended Abstract
Show others...
2019 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a novel framework for automatic speech-driven gesture generation applicable to human-agent interaction, including both virtual agents and robots. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech features as input and produces gestures in the form of sequences of 3D joint coordinates representing motion as output. The results of objective and subjective evaluations confirm the benefits of the representation learning.

Place, publisher, year, edition, pages
The International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2019
Keywords
Gesture generation; social robotics; representation learning; neural network; deep learning; virtual agents
National Category
Human Computer Interaction
Research subject
Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-251648 (URN)000474345000309 ()
Conference
International Conference on Autonomous Agents and Multiagent Systems (AAMAS '19), May 13-17, 2019, Montréal, Canada
Projects
EACare
Funder
Swedish Foundation for Strategic Research , RIT15-0107
Note

QC 20190515

Available from: 2019-05-16 Created: 2019-05-16 Last updated: 2019-10-25Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-5750-9655

Search in DiVA

Show all publications