kth.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 131) Show all publications
Li, C., Yang, Y., Weng, Z., Hernlund, E., Zuffi, S. & Kjellström, H. (2025). Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images. In: Computer Vision – ACCV 2024 - 17th Asian Conference on Computer Vision, Proceedings: . Paper presented at 17th Asian Conference on Computer Vision, ACCV 2024, Hanoi, Viet Nam, Dec 8 2024 - Dec 12 2024 (pp. 268-288). Springer Science and Business Media Deutschland GmbH
Open this publication in new window or tab >>Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images
Show others...
2025 (English)In: Computer Vision – ACCV 2024 - 17th Asian Conference on Computer Vision, Proceedings, Springer Science and Business Media Deutschland GmbH , 2025, p. 268-288Conference paper, Published paper (Refereed)
Abstract [en]

In recent years, 3D parametric animal models have been developed to aid in estimating 3D shape and pose from images and video. While progress has been made for humans, it’s more challenging for animals due to limited annotated data. To address this, we introduce the first method using synthetic data generation and disentanglement to learn to regress 3D shape and pose. Focusing on horses, we use text-based texture generation and a synthetic data pipeline to create varied shapes, poses, and appearances, learning disentangled spaces. Our method, Dessie, surpasses existing 3D horse reconstruction methods and generalizes to other large animals like zebras, cows, and deer. See the project website at: https://celiali.github.io/Dessie/.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2025
Keywords
Animal 3D reconstruction, disentanglement
National Category
Computer graphics and computer vision Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-358262 (URN)10.1007/978-981-96-0972-7_16 (DOI)2-s2.0-85213389101 (Scopus ID)
Conference
17th Asian Conference on Computer Vision, ACCV 2024, Hanoi, Viet Nam, Dec 8 2024 - Dec 12 2024
Note

Part of ISBN 9789819609710]

QC 20250113

Available from: 2025-01-08 Created: 2025-01-08 Last updated: 2025-02-01Bibliographically approved
Zuffi, S., Mellbin, Y., Li, C., Hoeschle, M., Kjellström, H., Polikovsky, S., . . . Black, M. J. (2024). VAREN: Very Accurate and Realistic Equine Network. In: 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024: . Paper presented at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), JUN 16-22, 2024, Seattle, WA (pp. 5374-5383). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>VAREN: Very Accurate and Realistic Equine Network
Show others...
2024 (English)In: 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 5374-5383Conference paper, Published paper (Refereed)
Abstract [en]

Data-driven three-dimensional parametric shape models of the human body have gained enormous popularity both for the analysis of visual data and for the generation of synthetic humans. Following a similar approach for animals does not scale to the multitude of existing animal species, not to mention the difficulty of accessing subjects to scan in 3D. However, we argue that for domestic species of great importance, like the horse, it is a highly valuable investment to put effort into gathering a large dataset of real 3D scans, and learn a realistic 3D articulated shape model. We introduce VAREN, a novel 3D articulated parametric shape model learned from 3D scans of many real horses. VAREN bridges synthesis and analysis tasks, as the generated model instances have unprecedented realism, while being able to represent horses of different sizes and shapes. Differently from previous body models, VAREN has two resolutions, an anatomical skeleton, and interpretable, learned pose-dependent deformations, which are related to the body muscles. We show with experiments that this formulation has superior performance with respect to previous strategies for modeling pose-dependent deformations in the human body case, while also being more compact and allowing an analysis of the relationship between articulation and muscle deformation during articulated motion. The VAREN model and data are available at https://varen.is.tue.mpg.de.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Series
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-358703 (URN)10.1109/CVPR52733.2024.00514 (DOI)001322555905073 ()2-s2.0-85198274738 (Scopus ID)
Conference
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), JUN 16-22, 2024, Seattle, WA
Note

Part of ISBN 979-8-3503-5301-3; 979-8-3503-5300-6

QC 20250120

Available from: 2025-01-20 Created: 2025-01-20 Last updated: 2025-01-20Bibliographically approved
Yin, W., Tu, R., Yin, H., Kragic, D., Kjellström, H. & Björkman, M. (2023). Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models. In: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN: . Paper presented at 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA (pp. 1102-1108). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models
Show others...
2023 (English)In: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 1102-1108Conference paper, Published paper (Refereed)
Abstract [en]

Data-driven and controllable human motion synthesis and prediction are active research areas with various applications in interactive media and social robotics. Challenges remain in these fields for generating diverse motions given past observations and dealing with imperfect poses. This paper introduces MoDiff, an autoregressive probabilistic diffusion model over motion sequences conditioned on control contexts of other modalities. Our model integrates a cross-modal Transformer encoder and a Transformer-based decoder, which are found effective in capturing temporal correlations in motion and control modalities. We also introduce a new data dropout method based on the diffusion forward process to provide richer data representations and robust generation. We demonstrate the superior performance of MoDiff in controllable motion synthesis for locomotion with respect to two baselines and show the benefits of diffusion data dropout for robust synthesis and reconstruction of high-fidelity motion close to recorded data.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Series
IEEE RO-MAN, ISSN 1944-9445
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-341978 (URN)10.1109/RO-MAN57019.2023.10309317 (DOI)001108678600131 ()2-s2.0-85186990309 (Scopus ID)
Conference
32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA
Note

Part of proceedings ISBN 979-8-3503-3670-2

QC 20240110

Available from: 2024-01-10 Created: 2024-01-10 Last updated: 2025-02-07Bibliographically approved
Broomé, S., Feighelstein, M., Zamansky, A., Carreira Lencioni, G., Haubro Andersen, P., Pessanha, F., . . . Salah, A. A. (2023). Going Deeper than Tracking: A Survey of Computer-Vision Based Recognition of Animal Pain and Emotions. International Journal of Computer Vision, 131(2), 572-590
Open this publication in new window or tab >>Going Deeper than Tracking: A Survey of Computer-Vision Based Recognition of Animal Pain and Emotions
Show others...
2023 (English)In: International Journal of Computer Vision, ISSN 0920-5691, E-ISSN 1573-1405, Vol. 131, no 2, p. 572-590Article in journal (Refereed) Published
Abstract [en]

Advances in animal motion tracking and pose recognition have been a game changer in the study of animal behavior. Recently, an increasing number of works go ‘deeper’ than tracking, and address automated recognition of animals’ internal states such as emotions and pain with the aim of improving animal welfare, making this a timely moment for a systematization of the field. This paper provides a comprehensive survey of computer vision-based research on recognition of pain and emotional states in animals, addressing both facial and bodily behavior analysis. We summarize the efforts that have been presented so far within this topic—classifying them across different dimensions, highlight challenges and research gaps, and provide best practice recommendations for advancing the field, and some future directions for research. 

Place, publisher, year, edition, pages
Springer Nature, 2023
Keywords
Affective computing, Computer vision for animals, Emotion recognition, Non-human behavior analysis, Pain estimation, Pain recognition, Animals, Behavioral research, Health, Surveys, Animal motion, Computer vision for animal, Human behavior analysis, Motion tracking, Non-human behavior analyse, Vision based, Computer vision
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-328890 (URN)10.1007/s11263-022-01716-3 (DOI)000888708500001 ()2-s2.0-85142475831 (Scopus ID)
Note

QC 20230613

Available from: 2023-06-13 Created: 2023-06-13 Last updated: 2025-02-07Bibliographically approved
Lawin, F. J., Bystrom, A., Roepstorff, C., Rhodin, M., Almloef, M., Silva, M., . . . Hernlund, E. (2023). Is Markerless More or Less?: Comparing a Smartphone Computer Vision Method for Equine Lameness Assessment to Multi-Camera Motion Capture. Animals, 13(3), Article ID 390.
Open this publication in new window or tab >>Is Markerless More or Less?: Comparing a Smartphone Computer Vision Method for Equine Lameness Assessment to Multi-Camera Motion Capture
Show others...
2023 (English)In: Animals, E-ISSN 2076-2615, Vol. 13, no 3, article id 390Article in journal (Refereed) Published
Abstract [en]

Lameness, an alteration of the gait due to pain or dysfunction of the locomotor system, is the most common disease symptom in horses. Yet, it is difficult for veterinarians to correctly assess by visual inspection. Objective tools that can aid clinical decision making and provide early disease detection through sensitive lameness measurements are needed. In this study, we describe how an AI-powered measurement tool on a smartphone can detect lameness in horses without the need to mount equipment on the horse. We compare it to a state-of-the-art multi-camera motion capture system by simultaneous, synchronised recordings from both systems. The mean difference between the systems' output of lameness metrics was below 2.2 mm. Therefore, we conclude that the smartphone measurement tool can detect lameness at relevant levels with easy-of-use for the veterinarian. Computer vision is a subcategory of artificial intelligence focused on extraction of information from images and video. It provides a compelling new means for objective orthopaedic gait assessment in horses using accessible hardware, such as a smartphone, for markerless motion analysis. This study aimed to explore the lameness assessment capacity of a smartphone single camera (SC) markerless computer vision application by comparing measurements of the vertical motion of the head and pelvis to an optical motion capture multi-camera (MC) system using skin attached reflective markers. Twenty-five horses were recorded with a smartphone (60 Hz) and a 13 camera MC-system (200 Hz) while trotting two times back and forth on a 30 m runway. The smartphone video was processed using artificial neural networks detecting the horse's direction, action and motion of body segments. After filtering, the vertical displacement curves from the head and pelvis were synchronised between systems using cross-correlation. This rendered 655 and 404 matching stride segmented curves for the head and pelvis respectively. From the stride segmented vertical displacement signals, differences between the two minima (MinDiff) and the two maxima (MaxDiff) respectively per stride were compared between the systems. Trial mean difference between systems was 2.2 mm (range 0.0-8.7 mm) for head and 2.2 mm (range 0.0-6.5 mm) for pelvis. Within-trial standard deviations ranged between 3.1-28.1 mm for MC and between 3.6-26.2 mm for SC. The ease of use and good agreement with MC indicate that the SC application is a promising tool for detecting clinically relevant levels of asymmetry in horses, enabling frequent and convenient gait monitoring over time.

Place, publisher, year, edition, pages
MDPI AG, 2023
Keywords
monocular motion analysis, objective lameness assessment, equine orthopaedics, animal pose estimation, optical motion capture
National Category
Clinical Science
Identifiers
urn:nbn:se:kth:diva-324900 (URN)10.3390/ani13030390 (DOI)000931327100001 ()36766279 (PubMedID)2-s2.0-85147826149 (Scopus ID)
Note

QC 20230321

Available from: 2023-03-21 Created: 2023-03-21 Last updated: 2024-01-17Bibliographically approved
Klasson, M., Kjellström, H. & Zhang, C. (2023). Learn the Time to Learn: Replay Scheduling in Continual Learning. Transactions on Machine Learning Research, 2023-November
Open this publication in new window or tab >>Learn the Time to Learn: Replay Scheduling in Continual Learning
2023 (English)In: Transactions on Machine Learning Research, E-ISSN 2835-8856, Vol. 2023-NovemberArticle in journal (Refereed) Published
Abstract [en]

Replay methods are known to be successful at mitigating catastrophic forgetting in continual learning scenarios despite having limited access to historical data. However, storing historical data is cheap in many real-world settings, yet replaying all historical data is often prohibited due to processing time constraints. In such settings, we propose that continual learning systems should learn the time to learn and schedule which tasks to replay at different time steps. We first demonstrate the benefits of our proposal by using Monte Carlo tree search to find a proper replay schedule, and show that the found replay schedules can outperform fixed scheduling policies when combined with various replay methods in different continual learning settings. Additionally, we propose a framework for learning replay scheduling policies with reinforcement learning. We show that the learned policies can gen-eralize better in new continual learning scenarios compared to equally replaying all seen tasks, without added computational cost. Our study reveals the importance of learning the time to learn in continual learning, which brings current research closer to real-world needs.

Place, publisher, year, edition, pages
Transactions on Machine Learning Research, 2023
National Category
Computer graphics and computer vision Computer Sciences
Identifiers
urn:nbn:se:kth:diva-361775 (URN)2-s2.0-86000627640 (Scopus ID)
Note

QC 20250328

Available from: 2025-03-27 Created: 2025-03-27 Last updated: 2025-03-28Bibliographically approved
Christoffersen, B., Mahjani, B., Clements, M., Kjellström, H. & Humphreys, K. (2023). Quasi-Monte Carlo Methods for Binary Event Models with Complex Family Data. Journal of Computational And Graphical Statistics, 32(4), 1393-1401
Open this publication in new window or tab >>Quasi-Monte Carlo Methods for Binary Event Models with Complex Family Data
Show others...
2023 (English)In: Journal of Computational And Graphical Statistics, ISSN 1061-8600, E-ISSN 1537-2715, Vol. 32, no 4, p. 1393-1401Article in journal (Refereed) Published
Abstract [en]

The generalized linear mixed model for binary outcomes with the probit link function is used in many fields but has a computationally challenging likelihood when there are many random effects. We extend a previously used importance sampler, making it much faster in the context of estimating heritability and related effects from family data by adding a gradient and a Hessian approximation and making a faster implementation. Additionally, a graph-based method is suggested to simplify the likelihood when there are thousands of individuals in each family. Simulation studies show that the resulting method is orders of magnitude faster, has a negligible efficiency loss, and confidence intervals with nominal coverage. We also analyze data from a large study of obsessive-compulsive disorder based on Swedish multi-generational data. In this analysis, the proposed method yielded similar results to a previous analysis, but was much faster. Supplementary materials for this article are available online.

Place, publisher, year, edition, pages
Informa UK Limited, 2023
Keywords
Family-based studies, Generalized linear mixed model, Importance sampling
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:kth:diva-350088 (URN)10.1080/10618600.2022.2151454 (DOI)000911289700001 ()2-s2.0-85146716373 (Scopus ID)
Note

QC 20240807

Available from: 2024-08-07 Created: 2024-08-07 Last updated: 2024-08-07Bibliographically approved
Broomé, S., Pokropek, E., Li, B. & Kjellström, H. (2023). Recur, Attend or Convolve?: On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition. In: 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV): . Paper presented at 23rd IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), JAN 03-07, 2023, Waikoloa, HI (pp. 4188-4198). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Recur, Attend or Convolve?: On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition
2023 (English)In: 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 4188-4198Conference paper, Published paper (Refereed)
Abstract [en]

Most action recognition models today are highly parameterized, and evaluated on datasets with appearance-wise distinct classes. It has also been shown that 2D Convolutional Neural Networks (CNNs) tend to be biased toward texture rather than shape in still image recognition tasks [19], in contrast to humans. Taken together, this raises suspicion that large video models partly learn spurious spatial texture correlations rather than to track relevant shapes over time to infer generalizable semantics from their movement. A natural way to avoid parameter explosion when learning visual patterns over time is to make use of recurrence. Biological vision consists of abundant recurrent circuitry, and is superior to computer vision in terms of domain shift generalization. In this article, we empirically study whether the choice of low-level temporal modeling has consequences for texture bias and cross-domain robustness. In order to enable a light-weight and systematic assessment of the ability to capture temporal structure, not revealed from single frames, we provide the Temporal Shape (TS) dataset, as well as modified domains of Diving48 allowing for the investigation of spatial texture bias in video models. The combined results of our experiments indicate that sound physical inductive bias such as recurrence in temporal modeling may be advantageous when robustness to domain shift is important for the task.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Series
IEEE Winter Conference on Applications of Computer Vision, ISSN 2472-6737
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-333276 (URN)10.1109/WACV56688.2023.00418 (DOI)000971500204030 ()2-s2.0-85149047006 (Scopus ID)
Conference
23rd IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), JAN 03-07, 2023, Waikoloa, HI
Note

QC 20230731

Available from: 2023-07-31 Created: 2023-07-31 Last updated: 2025-02-07Bibliographically approved
Colomer, M. B., Dovesi, P. L., Panagiotakopoulos, T., Carvalho, J. F., Harenstam-Nielsen, L., Azizpour, H., . . . Poggi, M. (2023). To Adapt or Not to Adapt?: Real-Time Adaptation for Semantic Segmentation. In: 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023): . Paper presented at IEEE/CVF International Conference on Computer Vision (ICCV), OCT 02-06, 2023, Paris, France (pp. 16502-16513). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>To Adapt or Not to Adapt?: Real-Time Adaptation for Semantic Segmentation
Show others...
2023 (English)In: 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 16502-16513Conference paper, Published paper (Refereed)
Abstract [en]

The goal of Online Domain Adaptation for semantic segmentation is to handle unforeseeable domain changes that occur during deployment, like sudden weather events. However, the high computational costs associated with bruteforce adaptation make this paradigm unfeasible for realworld applications. In this paper we propose HAMLET, a Hardware-Aware Modular Least Expensive Training framework for real-time domain adaptation. Our approach includes a hardware-aware back-propagation orchestration agent (HAMT) and a dedicated domain-shift detector that enables active control over when and how the model is adapted (LT). Thanks to these advancements, our approach is capable of performing semantic segmentation while simultaneously adapting at more than 29FPS on a single consumer-grade GPU. Our framework's encouraging accuracy and speed trade-off is demonstrated on OnDA and SHIFT benchmarks through experimental results.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Series
IEEE International Conference on Computer Vision, ISSN 1550-5499
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-346099 (URN)10.1109/ICCV51070.2023.01517 (DOI)001169500501012 ()2-s2.0-85188276872 (Scopus ID)
Conference
IEEE/CVF International Conference on Computer Vision (ICCV), OCT 02-06, 2023, Paris, France
Note

QC 20240503

Part of ISBN: 979-8-3503-0718-4

Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2024-05-03Bibliographically approved
Mikheeva, O., Kazlauskaite, I., Hartshorne, A., Kjellström, H., Ek, C. H. & Campbell, N. D. .. (2022). Aligned Multi-Task Gaussian Process. In: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022: . Paper presented at 25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022, Virtual, Online, Spain, Mar 28 2022 - Mar 30 2022 (pp. 2970-2988). ML Research Press
Open this publication in new window or tab >>Aligned Multi-Task Gaussian Process
Show others...
2022 (English)In: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022, ML Research Press , 2022, p. 2970-2988Conference paper, Published paper (Refereed)
Abstract [en]

Multi-task learning requires accurate identification of the correlations between tasks. In real-world time-series, tasks are rarely perfectly temporally aligned; traditional multitask models do not account for this and subsequent errors in correlation estimation will result in poor predictive performance and uncertainty quantification. We introduce a method that automatically accounts for temporal misalignment in a unified generative model that improves predictive performance. Our method uses Gaussian processes (GPs) to model the correlations both within and between the tasks. Building on the previous work by Kazlauskaite et al. (2019), we include a separate monotonic warp of the input data to model temporal misalignment. In contrast to previous work, we formulate a lower bound that accounts for uncertainty in both the estimates of the warping process and the underlying functions. Also, our new take on a monotonic stochastic process, with efficient path-wise sampling for the warp functions, allows us to perform full Bayesian inference in the model rather than MAP estimates. Missing data experiments, on synthetic and real time-series, demonstrate the advantages of accounting for misalignments (vs standard unaligned method) as well as modelling the uncertainty in the warping process (vs baseline MAP alignment approach).

Place, publisher, year, edition, pages
ML Research Press, 2022
Series
Proceedings of Machine Learning Research, ISSN 26403498
National Category
Probability Theory and Statistics Control Engineering
Identifiers
urn:nbn:se:kth:diva-331672 (URN)000828072703003 ()2-s2.0-85163135661 (Scopus ID)
Conference
25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022, Virtual, Online, Spain, Mar 28 2022 - Mar 30 2022
Note

QC 20230713

Available from: 2023-07-13 Created: 2023-07-13 Last updated: 2023-08-24Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-5750-9655

Search in DiVA

Show all publications