kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Enhancing Visual Domain Robustness in Behaviour Cloning via Saliency-Guided Augmentation
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0009-0008-7672-970X
KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Autonomous Systems, CAS.
Aalto University, Aalto University.
Show others and affiliations
2024 (English)In: Proceedings of Machine Learning Research, ML Research Press , 2024, Vol. 270, p. 4314-4331Conference paper, Published paper (Refereed)
Abstract [en]

In vision-based behaviour cloning (BC), conventional image augmentations like Random Crop and Colour Jitter often fall short when addressing substantial visual domain shifts, such as variations in shadow, distractors and backgrounds. Superimposition-based augmentations, which blend in-domain and out-of-domain images, have shown promise for improving model generalisation in the computer vision community, but their suitability for BC remains uncertain due to the need to preserve task-critical semantics, spatial-temporal relationships, and agent-target interactions. To address this, we introduce RoboSaGA-a Saliency-Guided Augmentation method within the superimposition family, tailored for vision-based BC. RoboSaGA dynamically adjusts augmentation intensity per pixel based on policy-driven saliency, enabling aggressive augmentation in task-trivial areas while preserving task-critical information. Moreover, it integrates seamlessly into existing architectures without requiring structural changes or additional learning objectives. Empirical evaluations in both simulated and real-world settings show that RoboSaGA maintains in-domain performance while significantly enhancing robustness to visual domain shifts, including distractors and background variations, as well as handling lighting and shadow variations. Code available at: https://github.com/Zheyu-Zhuang/RoboSaGA.

Place, publisher, year, edition, pages
ML Research Press , 2024. Vol. 270, p. 4314-4331
Keywords [en]
Behaviour Cloning, Data Augmentation, Visual Generalisation
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:kth:diva-361724Scopus ID: 2-s2.0-86000793083OAI: oai:DiVA.org:kth-361724DiVA, id: diva2:1947991
Conference
8th Conference on Robot Learning, CoRL 2024, Munich, 6 November 2024
Note

QC 20250331

Available from: 2025-03-27 Created: 2025-03-27 Last updated: 2025-03-31Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Zhuang, ZheyuWang, RuiyuIngelhag, NilsKragic, Danica

Search in DiVA

By author/editor
Zhuang, ZheyuWang, RuiyuIngelhag, NilsKragic, Danica
By organisation
Robotics, Perception and Learning, RPLCentre for Autonomous Systems, CAS
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 20 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf