kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 125) Show all publications
Zhu, X., Mårtensson, P., Hanson, L., Björkman, M. & Maki, A. (2025). Automated assembly quality inspection by deep learning with 2D and 3D synthetic CAD data. Journal of Intelligent Manufacturing, 36(4), 2567-2582, Article ID e222.
Open this publication in new window or tab >>Automated assembly quality inspection by deep learning with 2D and 3D synthetic CAD data
Show others...
2025 (English)In: Journal of Intelligent Manufacturing, ISSN 0956-5515, E-ISSN 1572-8145, Vol. 36, no 4, p. 2567-2582, article id e222Article in journal (Refereed) Published
Abstract [en]

In the manufacturing industry, automatic quality inspections can lead to improved product quality and productivity. Deep learning-based computer vision technologies, with their superior performance in many applications, can be a possible solution for automatic quality inspections. However, collecting a large amount of annotated training data for deep learning is expensive and time-consuming, especially for processes involving various products and human activities such as assembly. To address this challenge, we propose a method for automated assembly quality inspection using synthetic data generated from computer-aided design (CAD) models. The method involves two steps: automatic data generation and model implementation. In the first step, we generate synthetic data in two formats: two-dimensional (2D) images and three-dimensional (3D) point clouds. In the second step, we apply different state-of-the-art deep learning approaches to the data for quality inspection, including unsupervised domain adaptation, i.e., a method of adapting models across different data distributions, and transfer learning, which transfers knowledge between related tasks. We evaluate the methods in a case study of pedal car front-wheel assembly quality inspection to identify the possible optimal approach for assembly quality inspection. Our results show that the method using Transfer Learning on 2D synthetic images achieves superior performance compared with others. Specifically, it attained 95% accuracy through fine-tuning with only five annotated real images per class. With promising results, our method may be suggested for other similar quality inspection use cases. By utilizing synthetic CAD data, our method reduces the need for manual data collection and annotation. Furthermore, our method performs well on test data with different backgrounds, making it suitable for different manufacturing environments.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Assembly quality inspection, Computer vision, Point cloud, Synthetic data, Transfer learning, Unsupervised domain adaptation
National Category
Computer Sciences Production Engineering, Human Work Science and Ergonomics
Identifiers
urn:nbn:se:kth:diva-363099 (URN)10.1007/s10845-024-02375-6 (DOI)001205028300001 ()2-s2.0-105002924620 (Scopus ID)
Note

QC 20250506

Available from: 2025-05-06 Created: 2025-05-06 Last updated: 2025-11-12Bibliographically approved
Zhu, X., Henningsson, J., Li, D., Mårtensson, P., Hanson, L., Björkman, M. & Maki, A. (2025). Domain Randomization for Object Detection in Manufacturing Applications Using Synthetic Data: A Comprehensive Study. In: 2025 IEEE International Conference on Robotics and Automation, ICRA 2025: . Paper presented at 2025 IEEE International Conference on Robotics and Automation, ICRA 2025, Atlanta, United States of America, May 19 2025 - May 23 2025 (pp. 16715-16721). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Domain Randomization for Object Detection in Manufacturing Applications Using Synthetic Data: A Comprehensive Study
Show others...
2025 (English)In: 2025 IEEE International Conference on Robotics and Automation, ICRA 2025, Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 16715-16721Conference paper, Published paper (Refereed)
Abstract [en]

This paper addresses key aspects of domain randomization in generating synthetic data for manufacturing object detection applications. To this end, we present a comprehensive data generation pipeline that reflects different factors: object characteristics, background, illumination, camera settings, and post-processing. We also introduce the Synthetic Industrial Parts Object Detection dataset (SIP15-OD) consisting of 15 objects from three industrial use cases under varying environments as a test bed for the study, while also employing an industrial dataset publicly available for robotic applications. In our experiments, we present more abundant results and insights into the feasibility as well as challenges of sim-toreal object detection. In particular, we identified material properties, rendering methods, post-processing, and distractors as important factors. Our method, leveraging these, achieves top performance on the public dataset with Yolov8 models trained exclusively on synthetic data; mAP@50 scores of 96.4% for the robotics dataset, and 94.1%, 99.5%, and 95.3% across three of the SIP15-OD use cases, respectively. The results showcase the effectiveness of the proposed domain randomization, potentially covering the distribution close to real data for the applications.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Computer graphics and computer vision Computer Sciences
Identifiers
urn:nbn:se:kth:diva-371386 (URN)10.1109/ICRA55743.2025.11128647 (DOI)2-s2.0-105016571384 (Scopus ID)
Conference
2025 IEEE International Conference on Robotics and Automation, ICRA 2025, Atlanta, United States of America, May 19 2025 - May 23 2025
Note

Part of ISBN 9798331541392

QC 20251009

Available from: 2025-10-09 Created: 2025-10-09 Last updated: 2025-11-12Bibliographically approved
Maus, R. & Maki, A. (2025). Efficient Object-Centric Learning for Videos. In: Image Analysis - 23rd Scandinavian Conference, SCIA 2025, Proceedings: . Paper presented at 23rd Scandinavian Conference on Image Analysis, SCIA 2025, Reykjavik, Iceland, June 23-25, 2025 (pp. 104-117). Springer Nature
Open this publication in new window or tab >>Efficient Object-Centric Learning for Videos
2025 (English)In: Image Analysis - 23rd Scandinavian Conference, SCIA 2025, Proceedings, Springer Nature , 2025, p. 104-117Conference paper, Published paper (Refereed)
Abstract [en]

This paper introduces a method for efficiently learning video-level object-centric representations by bootstrapping off a pre-trained image backbone, which we term Interpreter. It presents a novel hierarchical slot attention architecture with local learning and an optimal transport objective that yields fully unsupervised video segmentation. We first learn to compress images into image-level object-centric representations. Interpreter then learns to compress and reconstruct the object-centric representations for each frame across a video, allowing us to circumvent the costly process of reconstructing full frame feature maps. Unlike prior work, this allows us to scale to significantly longer videos without resorting to chunking videos into segments and matching between them. To deal with the unordered nature of object-centric representations, we employ Sinkhorn divergence, a relaxed optimal transport objective, to compute the distance between unordered sets of representations. We evaluate the resulting segmentation maps on video instance segmentation in both realistic and synthetic settings, using YTVIS-19 and MOVi-E, respectively. Interpreter achieves state-of-the-art results on the realistic YTVIS-19 dataset and presents a promising approach of scaling object-centric representation learning to longer videos (Code to be made publicly available at a later date).

Place, publisher, year, edition, pages
Springer Nature, 2025
National Category
Computer graphics and computer vision Computer Sciences
Identifiers
urn:nbn:se:kth:diva-368910 (URN)10.1007/978-3-031-95911-0_8 (DOI)001553875500008 ()2-s2.0-105009858062 (Scopus ID)
Conference
23rd Scandinavian Conference on Image Analysis, SCIA 2025, Reykjavik, Iceland, June 23-25, 2025
Note

Part of ISBN 9783031959103

QC 20250822

Available from: 2025-08-22 Created: 2025-08-22 Last updated: 2025-12-08Bibliographically approved
Xu, Y., Bretzner, L., Wang, T. & Maki, A. (2025). Skor-Xg: Skeleton-Oriented Expected Goal Estimation in Soccer. In: Proceedings - 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025: . Paper presented at 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025, Nashville, United States of America, June 11-12, 2025 (pp. 5957-5967). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Skor-Xg: Skeleton-Oriented Expected Goal Estimation in Soccer
2025 (English)In: Proceedings - 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025, Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 5957-5967Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we present Skor-xG, which to the best of our knowledge is the first model to introduce 3D player skeletons into Expected Goal (x G) estimation. xG estimation is a fundamental task in soccer analytics that quantifies a shot's likelihood of scoring. Unlike existing x G models which primarily rely on engineered features from event data and 2D positional data, Skor-xG leverages detailed player postures to enhance shot evaluation. To effectively capture the complex interactions between player body parts and the ball, we propose a Graph Neural Networkbased framework that models each shot as a spatiotemporal graph. Experimental results demonstrate that incorporating skeleton data improves x G estimation compared to conventional approaches. As 3D player tracking technology becomes increasingly accessible, Skor-xG establishes skeleton data as a valuable new dimension in soccer analytics, enabling deeper tactical insights and more precise performance evaluation.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Computer Sciences Computer Systems Signal Processing Computer Engineering
Identifiers
urn:nbn:se:kth:diva-372342 (URN)10.1109/CVPRW67362.2025.00594 (DOI)2-s2.0-105017859769 (Scopus ID)
Conference
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025, Nashville, United States of America, June 11-12, 2025
Note

Part of ISBN 9798331599942

QC 20251106

Available from: 2025-11-06 Created: 2025-11-06 Last updated: 2025-11-06Bibliographically approved
Zhu, X., Henningsson, J., Li, D., Mårtensson, P., Hanson, L., Björkman, M. & Maki, A. (2025). Towards Automated Assembly Quality Inspection with Synthetic Data and Domain Randomization. In: Proceedings: IEEE/CVF International Conference on Computer Vision Workshop, ICCVW, 2025. Paper presented at IEEE/CVF International Conference on Computer Vision Workshop 2025, Honolulu, Hawaii, USA, October 19-25, 2025 (pp. 1395-1403).
Open this publication in new window or tab >>Towards Automated Assembly Quality Inspection with Synthetic Data and Domain Randomization
Show others...
2025 (English)In: Proceedings: IEEE/CVF International Conference on Computer Vision Workshop, ICCVW, 2025, 2025, p. 1395-1403Conference paper, Published paper (Refereed)
Abstract [en]

Assembly quality inspection plays a vital role in manufacturing, where correct part placement and alignment directly affect product reliability. While deep learning–based object detection offers a promising solution for automatic assembly quality inspection, it is hindered by data scarcity. Training on synthetic data with Domain Randomization (DR) helps address this challenge, yet existing DR methods focus on generating individual objects and do not capture the relational structure needed for assembly inspection. In this paper, we identify two key factors for effective synthetic data generation in assembly inspection: preserving spatial relationships between components and providing part-level textures and annotations. We propose an Assembly-Specific Generation Scheme that incorporates these factors into a state-of-the-art DR pipeline. To evaluate its impact, we introduce SIP2A-OD, a new object detection dataset comprising two real-world assembly use cases collected under varied manufacturing conditions. We train a YOLOv12 model on synthetic data generated by our pipeline and test it on real data from the SIP2A-OD dataset. Compared to the baseline pipeline designed for individual object detection, our method improves mAP@50 by more than 15% in both use cases. These results demonstrate the effectiveness of our scheme and its potential for broader applications in industrial assembly inspection without the need for manual data collection or annotation.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-372634 (URN)
Conference
IEEE/CVF International Conference on Computer Vision Workshop 2025, Honolulu, Hawaii, USA, October 19-25, 2025
Note

QC 20251112

Available from: 2025-11-11 Created: 2025-11-11 Last updated: 2025-11-12Bibliographically approved
Sabel, D., Westin, T. & Maki, A. (2023). 3D Point Cloud Registration for GNSS-denied Aerial Localization over Forests. In: Image Analysis - 23rd Scandinavian Conference, SCIA 2023, Proceedings: . Paper presented at 23nd Scandinavian Conference on Image Analysis, SCIA 2023, Lapland, Finland, Apr 18 2023 - Apr 21 2023 (pp. 396-411). Springer Nature
Open this publication in new window or tab >>3D Point Cloud Registration for GNSS-denied Aerial Localization over Forests
2023 (English)In: Image Analysis - 23rd Scandinavian Conference, SCIA 2023, Proceedings, Springer Nature , 2023, p. 396-411Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a vision-based localization approach for Unmanned Aerial Vehicles (UAVs) flying at low altitude over forested areas. We address the task as a point cloud registration problem using local 3D features with the intention to exploit the shape and relative arrangement of the trees. We propose a 3D descriptor called SHOT-N which is an adaptation of the state-of-the-art SHOT 3D descriptor. SHOT-N leverages constraints in the extrinsic parameters of a gimballed, nadir-looking camera. Extensive experiments were performed with semi-simulated point cloud data based on real aerial images over four forested areas. SHOT-N is shown to outperform two state-of-the-art 3D descriptors in terms of the rate of successful registrations. The results suggest a high potential of the approach for aerial localization over forested areas.

Place, publisher, year, edition, pages
Springer Nature, 2023
Keywords
aerial navigation, GNSS-denied, natural environments, point cloud registration, visual navigation
National Category
Robotics and automation Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-338612 (URN)10.1007/978-3-031-31435-3_27 (DOI)2-s2.0-85161464639 (Scopus ID)
Conference
23nd Scandinavian Conference on Image Analysis, SCIA 2023, Lapland, Finland, Apr 18 2023 - Apr 21 2023
Note

Part of ISBN 9783031314346

QC 20231106

Available from: 2023-11-06 Created: 2023-11-06 Last updated: 2025-02-05Bibliographically approved
Fukui, K., Sogi, N., Kobayashi, T., Xue, J.-H. & Maki, A. (2023). Discriminant feature extraction by generalized difference subspace. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2), 1618-1635
Open this publication in new window or tab >>Discriminant feature extraction by generalized difference subspace
Show others...
2023 (English)In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 45, no 2, p. 1618-1635Article in journal (Refereed) Published
Abstract [en]

This paper reveals the discriminant ability of the orthogonal projection of data onto a generalized difference subspace (GDS) both theoretically and experimentally. In our previous work, we have demonstrated that GDS projection works as the quasi-orthogonalization of class subspaces. Interestingly, GDS projection also works as a discriminant feature extraction through a similar mechanism to the Fisher discriminant analysis (FDA). A direct proof of the connection between GDS projection and FDA is difficult due to the significant difference in their formulations. To avoid the difficulty, we first introduce geometrical Fisher discriminant analysis (gFDA) based on a simplified Fisher criterion. gFDA can work stably even under few samples, bypassing the small sample size (SSS) problem of FDA. Next, we prove that gFDA is equivalent to GDS projection with a small correction term. This equivalence ensures GDS projection to inherit the discriminant ability from FDA via gFDA. Furthermore, we discuss two useful extensions of these methods, 1) nonlinear extension by kernel trick, 2) the combination of convolutional neural network (CNN) features. The equivalence and the effectiveness of the extensions have been verified through extensive experiments on the extended Yale B+, CMU face database, ALOI, ETH80, MNIST and CIFAR10, focusing on the SSS problem. IEEE

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Discriminant analysis, Face recognition, Feature extraction, Fisher criterion, Image recognition, Kernel, Lighting, PCA without data centering, Principal component analysis, subspace representation, Task analysis, Extraction, Fisher information matrix, Job analysis, Neural networks, Discriminant feature extraction, Features extraction, Fisher discriminant analysis, Principal-component analysis, Subspace projection
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-323273 (URN)10.1109/TPAMI.2022.3168557 (DOI)000912386000019 ()35439128 (PubMedID)2-s2.0-85128699946 (Scopus ID)
Note

QC 20230124

Available from: 2023-01-24 Created: 2023-01-24 Last updated: 2025-02-07Bibliographically approved
Zhu, X., Björkman, M., Maki, A., Hanson, L. & Mårtensson, P. (2023). Surface Defect Detection with Limited Training Data: A Case Study on Crown Wheel Surface Inspection. In: 56th CIRP International Conference on Manufacturing Systems, CIRP CMS 2023: . Paper presented at 56th CIRP International Conference on Manufacturing Systems, CIRP CMS 2023, Cape Town, South Africa, Oct 24 2023 - Oct 26 2023 (pp. 1333-1338). Elsevier BV
Open this publication in new window or tab >>Surface Defect Detection with Limited Training Data: A Case Study on Crown Wheel Surface Inspection
Show others...
2023 (English)In: 56th CIRP International Conference on Manufacturing Systems, CIRP CMS 2023, Elsevier BV , 2023, p. 1333-1338Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents an approach to automatic surface defect detection by a deep learning-based object detection method, particularly in challenging scenarios where defects are rare, i.e., with limited training data. We base our approach on an object detection model YOLOv8, preceded by a few steps: 1) filtering out irrelevant information, 2) enhancing the visibility of defects, namely brightness contrast, and 3) increasing the diversity of the training data through data augmentation. We evaluated the method in an industrial case study of crown wheel surface inspection in detecting Unclean Gear as well as Deburring defects, resulting in promising performances. With the combination of the three preprocessing steps, we improved the detection accuracy by 22.2% and 37.5% respectively while detecting those two defects. We believe that the proposed approach is also adaptable to various applications of surface defect detection in other industrial environments as the employed techniques, such as image segmentation, are available off the shelf.

Place, publisher, year, edition, pages
Elsevier BV, 2023
Keywords
Automatic Quality Inspection, Computer Vision, Deep Learning, Image Processing, Surface Defect Detection
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-343752 (URN)10.1016/j.procir.2023.09.172 (DOI)001483980700224 ()2-s2.0-85184602644 (Scopus ID)
Conference
56th CIRP International Conference on Manufacturing Systems, CIRP CMS 2023, Cape Town, South Africa, Oct 24 2023 - Oct 26 2023
Note

QC 20240222

Available from: 2024-02-22 Created: 2024-02-22 Last updated: 2025-12-08Bibliographically approved
Zhu, X., Bilal, T., Mårtensson, P., Hanson, L., Björkman, M. & Maki, A. (2023). Towards sim-to-real industrial parts classification with synthetic dataset. In: Proceedings: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023. Paper presented at 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023, Vancouver, Canada, Jun 18 2023 - Jun 22 2023 (pp. 4454-4463). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Towards sim-to-real industrial parts classification with synthetic dataset
Show others...
2023 (English)In: Proceedings: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 4454-4463Conference paper, Published paper (Refereed)
Abstract [en]

This paper is about effectively utilizing synthetic data for training deep neural networks for industrial parts classification, in particular, by taking into account the domain gap against real-world images. To this end, we introduce a synthetic dataset that may serve as a preliminary testbed for the Sim-to-Real challenge; it contains 17 objects of six industrial use cases, including isolated and assembled parts. A few subsets of objects exhibit large similarities in shape and albedo for reflecting challenging cases of industrial parts. All the sample images come with and without random backgrounds and post-processing for evaluating the importance of domain randomization. We call it Synthetic Industrial Parts dataset (SIP-17). We study the usefulness of SIP-17 through benchmarking the performance of five state-of-the-art deep network models, supervised and self-supervised, trained only on the synthetic data while testing them on real data. By analyzing the results, we deduce some insights on the feasibility and challenges of using synthetic data for industrial parts classification and for further developing larger-scale synthetic datasets. Our dataset † and code ‡ are publicly available.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-337847 (URN)10.1109/CVPRW59228.2023.00468 (DOI)2-s2.0-85170821045 (Scopus ID)
Conference
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023, Vancouver, Canada, Jun 18 2023 - Jun 22 2023
Note

Part of ISBN 9798350302493

QC 20231010

Available from: 2023-10-10 Created: 2023-10-10 Last updated: 2025-11-12Bibliographically approved
Maki, A., Kragic, D., Kjellström, H., Azizpour, H., Sullivan, J., Björkman, M., . . . Sundblad, Y. (2022). In Memoriam: Jan-Olof Eklundh. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 4488-4489
Open this publication in new window or tab >>In Memoriam: Jan-Olof Eklundh
Show others...
2022 (English)In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 44, no 9, p. 4488-4489Article in journal (Refereed) Published
Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2022
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-316696 (URN)10.1109/TPAMI.2022.3183266 (DOI)000836666600005 ()
Note

QC 20220905

Available from: 2022-09-05 Created: 2022-09-05 Last updated: 2022-09-05Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4266-6746

Search in DiVA

Show all publications