A crucial part of traditional reinforcement learning (RL) is the initial exploration phase, in which trying available actions randomly is a critical element. As random behavior might be detrimental to a social interaction, this work proposes a novel paradigm for learning social robot behavior-the use of shielding to ensure socially appropriate behavior during exploration and learning. We explore how a data-driven approach for shielding could be used to generate listening behavior. In a video-based user study (N=110), we compare shielded exploration to two other exploration methods. We show that the shielded exploration is perceived as more comforting and appropriate than a straightforward random approach. Based on our findings, we discuss the potential for future work using shielded and socially guided approaches for learning idiosyncratic social robot behaviors through RL.
Part of ISBN 979-8-3503-7502-2
QC 20250122