Åpne denne publikasjonen i ny fane eller vindu >>Vise andre…
2025 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]
Preference-based reinforcement learning (PbRL) is a suitable approach for style adaptation of pre-trained robotic behavior: adapting the robot's policy to follow human user preferences while still being able to perform the original task. However, collecting preferences for the adaptation process in robotics is often challenging and time-consuming. In this work we explore the adaptation of pre-trained robots in the low-preference-data regime. We show that, in this regime, recent adaptation approaches suffer from catastrophic reward forgetting (CRF), where the updated reward model overfits to the new preferences, leading the agent to become unable to perform the original task. To mitigate CRF, we propose to enhance the original reward model with a small number of parameters (low-rank matrices) responsible for modeling the preference adaptation. Our evaluation shows that our method can efficiently and effectively adjust robotic behavior to human preferences across simulation benchmark tasks and multiple real-world robotic tasks. We provide videos of our results and source code at https://sites.google.com/view/preflora/
sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2025
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-360980 (URN)
Konferanse
IEEE International Conference on Robotics and Automation (ICRA), Atlanta, USA, 19-23 May 2025
Merknad
QC 20250618
2025-03-072025-03-072025-06-18bibliografisk kontrollert