kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 44) Show all publications
Marta, D., Holk, S., Pek, C. & Leite, I. (2024). SEQUEL: Semi-Supervised Preference-based RL with Query Synthesis via Latent Interpolation. In: 2024 IEEE International Conference on Robotics and Automation (ICRA): . Paper presented at IEEE International Conference on Robotics and Automation (ICRA) (pp. 9585-9592). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>SEQUEL: Semi-Supervised Preference-based RL with Query Synthesis via Latent Interpolation
2024 (English)In: 2024 IEEE International Conference on Robotics and Automation (ICRA), Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 9585-9592Conference paper, Published paper (Refereed)
Abstract [en]

Preference-based reinforcement learning (RL) poses as a recent research direction in robot learning, by allowing humans to teach robots through preferences on pairs of desired behaviours. Nonetheless, to obtain realistic robot policies, an arbitrarily large number of queries is required to be answered by humans. In this work, we approach the sample-efficiency challenge by presenting a technique which synthesizes queries, in a semi-supervised learning perspective. To achieve this, we leverage latent variational autoencoder (VAE) representations of trajectory segments (sequences of state-action pairs). Our approach manages to produce queries which are closely aligned with those labeled by humans, while avoiding excessive uncertainty according to the human preference predictions as determined by reward estimations. Additionally, by introducing variation without deviating from the original human’s intents, more robust reward function representations are achieved. We compare our approach to recent state-of-the-art preference-based RL semi-supervised learning techniques. Our experimental findings reveal that we can enhance the generalization of the estimated reward function without requiring additional human intervention. Lastly, to confirm the practical applicability of our approach, we conduct experiments involving actual human users in a simulated social navigation setting. Videos of the experiments can be found at https://sites.google.com/view/rl-sequel

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-360978 (URN)10.1109/ICRA57147.2024.10610534 (DOI)001369728000064 ()2-s2.0-85199009127 (Scopus ID)
Conference
IEEE International Conference on Robotics and Automation (ICRA)
Note

Part of ISBN 9798350384574

QC 20250310

Available from: 2025-03-07 Created: 2025-03-07 Last updated: 2025-03-10Bibliographically approved
Marta, D., Holk, S., Pek, C., Tumova, J. & Leite, I. (2023). Aligning Human Preferences with Baseline Objectives in Reinforcement Learning. In: 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023): . Paper presented at IEEE International Conference on Robotics and Automation (ICRA), MAY 29-JUN 02, 2023, London, ENGLAND. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Aligning Human Preferences with Baseline Objectives in Reinforcement Learning
Show others...
2023 (English)In: 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), Institute of Electrical and Electronics Engineers (IEEE) , 2023Conference paper, Published paper (Refereed)
Abstract [en]

Practical implementations of deep reinforcement learning (deep RL) have been challenging due to an amplitude of factors, such as designing reward functions that cover every possible interaction. To address the heavy burden of robot reward engineering, we aim to leverage subjective human preferences gathered in the context of human-robot interaction, while taking advantage of a baseline reward function when available. By considering baseline objectives to be designed beforehand, we are able to narrow down the policy space, solely requesting human attention when their input matters the most. To allow for control over the optimization of different objectives, our approach contemplates a multi-objective setting. We achieve human-compliant policies by sequentially training an optimal policy from a baseline specification and collecting queries on pairs of trajectories. These policies are obtained by training a reward estimator to generate Pareto optimal policies that include human preferred behaviours. Our approach ensures sample efficiency and we conducted a user study to collect real human preferences, which we utilized to obtain a policy on a social navigation environment.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Series
IEEE International Conference on Robotics and Automation ICRA
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-324924 (URN)10.1109/ICRA48891.2023.10161261 (DOI)001048371100079 ()2-s2.0-85164820716 (Scopus ID)
Conference
IEEE International Conference on Robotics and Automation (ICRA), MAY 29-JUN 02, 2023, London, ENGLAND
Note

Part of ISBN 979-8-3503-2365-8

QC 20230328

Available from: 2023-03-21 Created: 2023-03-21 Last updated: 2025-05-19Bibliographically approved
van Waveren, S., Pek, C., Leite, I., Tumova, J. & Kragic, D. (2023). Generating Scenarios from High-Level Specifications for Object Rearrangement Tasks. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023: . Paper presented at 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023, Detroit, United States of America, Oct 1 2023 - Oct 5 2023 (pp. 11420-11427). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Generating Scenarios from High-Level Specifications for Object Rearrangement Tasks
Show others...
2023 (English)In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 11420-11427Conference paper, Published paper (Refereed)
Abstract [en]

Rearranging objects is an essential skill for robots. To quickly teach robots new rearrangements tasks, we would like to generate training scenarios from high-level specifications that define the relative placement of objects for the task at hand. Ideally, to guide the robot's learning we also want to be able to rank these scenarios according to their difficulty. Prior work has shown how generating diverse scenario from specifications and providing the robot with easy-to-difficult samples can improve the learning. Yet, existing scenario generation methods typically cannot generate diverse scenarios while controlling their difficulty. We address this challenge by conditioning generative models on spatial logic specifications to generate spatially-structured scenarios that meet the specification and desired difficulty level. Our experiments showed that generative models are more effective and data-efficient than rejection sam-pling and that the spatially-structured scenarios can drastically improve training of downstream tasks by orders of magnitude.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-342642 (URN)10.1109/IROS55552.2023.10341369 (DOI)001136907804123 ()2-s2.0-85182525633 (Scopus ID)
Conference
2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023, Detroit, United States of America, Oct 1 2023 - Oct 5 2023
Note

Part of ISBN 9781665491907

QC 20240125

Available from: 2024-01-25 Created: 2024-01-25 Last updated: 2025-02-09Bibliographically approved
van Waveren, S., Rudling, R., Leite, I., Jensfelt, P. & Pek, C. (2023). Increasing perceived safety in motion planning for human-drone interaction. In: HRI 2023: Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. Paper presented at 18th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2023, Stockholm, Sweden, Mar 13 2023 - Mar 16 2023 (pp. 446-455). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Increasing perceived safety in motion planning for human-drone interaction
Show others...
2023 (English)In: HRI 2023: Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, Association for Computing Machinery (ACM) , 2023, p. 446-455Conference paper, Published paper (Refereed)
Abstract [en]

Safety is crucial for autonomous drones to operate close to humans. Besides avoiding unwanted or harmful contact, people should also perceive the drone as safe. Existing safe motion planning approaches for autonomous robots, such as drones, have primarily focused on ensuring physical safety, e.g., by imposing constraints on motion planners. However, studies indicate that ensuring physical safety does not necessarily lead to perceived safety. Prior work in Human-Drone Interaction (HDI) shows that factors such as the drone's speed and distance to the human are important for perceived safety. Building on these works, we propose a parameterized control barrier function (CBF) that constrains the drone's maximum deceleration and minimum distance to the human and update its parameters on people's ratings of perceived safety. We describe an implementation and evaluation of our approach. Results of a withinsubject user study (Ng= 15) show that we can improve perceived safety of a drone by adjusting to people individually.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
control barrier functions, human-drone interaction, motion planning, perceived safety
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-333381 (URN)10.1145/3568162.3576966 (DOI)2-s2.0-85150349732 (Scopus ID)
Conference
18th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2023, Stockholm, Sweden, Mar 13 2023 - Mar 16 2023
Note

Part of ISBN 9781450399647

QC 20230801

Available from: 2023-08-01 Created: 2023-08-01 Last updated: 2023-08-01Bibliographically approved
Vahs, M., Pek, C. & Tumova, J. (2023). Risk-aware Spatio-temporal Logic Planning in Gaussian Belief Spaces. In: Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation. Paper presented at 2023 IEEE International Conference on Robotics and Automation, ICRA 2023, London, United Kingdom of Great Britain and Northern Ireland, May 29 2023 - Jun 2 2023 (pp. 7879-7885). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Risk-aware Spatio-temporal Logic Planning in Gaussian Belief Spaces
2023 (English)In: Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 7879-7885Conference paper, Published paper (Other academic)
Abstract [en]

In many real-world robotic scenarios, we cannot assume exact knowledge about a robot’s state due to unmodeled dynamics or noisy sensors. Planning in belief space addresses this problem by tightly coupling perception and planning modules to obtain trajectories that take into account the environment’s stochasticity. However, existing works are often limited to tasks such as the classic reach-avoid problem and do not provide risk awareness. We propose a risk-aware planning strategy in belief space that minimizes the risk of violating a given specification and enables a robot to actively gather information about its state. We use Risk Signal Temporal Logic (RiSTL) as a specification language in belief space to express complex spatio-temporal missions including predicates over Gaussian beliefs. We synthesize trajectories for challenging scenarios that cannot be expressed through classical reach-avoid properties and show that risk-aware objectives improve the uncertainty reduction in a robot’s belief.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-324917 (URN)10.1109/ICRA48891.2023.10160973 (DOI)001048371101031 ()2-s2.0-85168678021 (Scopus ID)
Conference
2023 IEEE International Conference on Robotics and Automation, ICRA 2023, London, United Kingdom of Great Britain and Northern Ireland, May 29 2023 - Jun 2 2023
Note

Part of ISBN 9798350323658

QC 20230328

Available from: 2023-03-20 Created: 2023-03-20 Last updated: 2025-02-09Bibliographically approved
Mitsioni, I., Tajvar, P., Kragic, D., Tumova, J. & Pek, C. (2023). Safe Data-Driven Model Predictive Control of Systems with Complex Dynamics. IEEE Transactions on robotics, 39(4), 3242-3258
Open this publication in new window or tab >>Safe Data-Driven Model Predictive Control of Systems with Complex Dynamics
Show others...
2023 (English)In: IEEE Transactions on robotics, ISSN 1552-3098, E-ISSN 1941-0468, Vol. 39, no 4, p. 3242-3258Article in journal (Refereed) Published
Abstract [en]

In this article, we address the task and safety performance of data-driven model predictive controllers (DD-MPC) for systems with complex dynamics, i.e., temporally or spatially varying dynamics that may also be discontinuous. The three challenges we focus on are the accuracy of learned models, the receding horizon-induced myopic predictions of DD-MPC, and the active encouragement of safety. To learn accurate models for DD-MPC, we cautiously, yet effectively, explore the dynamical system with rapidly exploring random trees (RRT) to collect a uniform distribution of samples in the state-input space and overcome the common distribution shift in model learning. The learned model is further used to construct an RRT tree that estimates how close the model's predictions are to the desired target. This information is used in the cost function of the DD-MPC to minimize the short-sighted effect of its receding horizon nature. To promote safety, we approximate sets of safe states using demonstrations of exclusively safe trajectories, i.e., without unsafe examples, and encourage the controller to generate trajectories close to the sets. As a running example, we use a broken version of an inverted pendulum where the friction abruptly changes in certain regions. Furthermore, we showcase the adaptation of our method to a real-world robotic application with complex dynamics: robotic food-cutting. Our results show that our proposed control framework effectively avoids unsafe states with higher success rates than baseline controllers that employ models from controlled demonstrations and even random actions.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Data-driven modelling, dimensionality reduction, formal specifications, predictive control, sampling methods
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-349873 (URN)10.1109/TRO.2023.3266995 (DOI)000986623300001 ()2-s2.0-85159813435 (Scopus ID)
Note

QC 20240704

Available from: 2024-07-04 Created: 2024-07-04 Last updated: 2025-02-09Bibliographically approved
Pek, C., Schuppe, G. F., Esposito, F., Tumova, J. & Kragic, D. (2023). SpaTiaL: monitoring and planning of robotic tasks using spatio-temporal logic specifications. Autonomous Robots, 47(8), 1439-1462
Open this publication in new window or tab >>SpaTiaL: monitoring and planning of robotic tasks using spatio-temporal logic specifications
Show others...
2023 (English)In: Autonomous Robots, ISSN 0929-5593, E-ISSN 1573-7527, Vol. 47, no 8, p. 1439-1462Article in journal (Refereed) Published
Abstract [en]

Many tasks require robots to manipulate objects while satisfying a complex interplay of spatial and temporal constraints. For instance, a table setting robot first needs to place a mug and then fill it with coffee, while satisfying spatial relations such as forks need to placed left of plates. We propose the spatio-temporal framework SpaTiaL that unifies the specification, monitoring, and planning of object-oriented robotic tasks in a robot-agnostic fashion. SpaTiaL is able to specify diverse spatial relations between objects and temporal task patterns. Our experiments with recorded data, simulations, and real robots demonstrate how SpaTiaL provides real-time monitoring and facilitates online planning. SpaTiaL is open source and easily expandable to new object relations and robotic applications.

Place, publisher, year, edition, pages
Springer Nature, 2023
Keywords
Monitoring, Object-centric planning, Spatio-temporal logics, Task and motion planning
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-348211 (URN)10.1007/s10514-023-10145-1 (DOI)001092898500001 ()2-s2.0-85175646748 (Scopus ID)
Note

QC 20240624

Available from: 2024-06-24 Created: 2024-06-24 Last updated: 2025-02-09Bibliographically approved
Marta, D., Holk, S., Pek, C., Tumova, J. & Leite, I. (2023). VARIQuery: VAE Segment-based Active Learning for Query Selection in Preference-based Reinforcement Learning. In: : . Paper presented at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023, October 1 – 5, 2023, Detroit, Michigan, USA..
Open this publication in new window or tab >>VARIQuery: VAE Segment-based Active Learning for Query Selection in Preference-based Reinforcement Learning
Show others...
2023 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Human-in-the-loop reinforcement learning (RL) methods actively integrate human knowledge to create reward functions for various robotic tasks. Learning from preferences shows promise as alleviates the requirement of demonstrations by querying humans on state-action sequences. However, the limited granularity of sequence-based approaches complicates temporal credit assignment. The amount of human querying is contingent on query quality, as redundant queries result in excessive human involvement. This paper addresses the often-overlooked aspect of query selection, which is closely related to active learning (AL). We propose a novel query selection approach that leverages variational autoencoder (VAE) representations of state sequences. In this manner, we formulate queries that are diverse in nature while simultaneously taking into account reward model estimations. We compare our approach to the current state-of-the-art query selection methods in preference-based RL, and find ours to be either on-par or more sample efficient through extensive benchmarking on simulated environments relevant to robotics. Lastly, we conduct an online study to verify the effectiveness of our query selection approach with real human feedback and examine several metrics related to human effort.

National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-333948 (URN)
Conference
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023, October 1 – 5, 2023, Detroit, Michigan, USA.
Note

QC 20230818

Available from: 2023-08-15 Created: 2023-08-15 Last updated: 2025-02-09Bibliographically approved
Marta, D., Holk, S., Pek, C., Tumova, J. & Leite, I. (2023). VARIQuery: VAE Segment-Based Active Learning for Query Selection in Preference-Based Reinforcement Learning. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023: . Paper presented at 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023, Detroit, United States of America, Oct 1 2023 - Oct 5 2023 (pp. 7878-7885). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>VARIQuery: VAE Segment-Based Active Learning for Query Selection in Preference-Based Reinforcement Learning
Show others...
2023 (English)In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 7878-7885Conference paper, Published paper (Refereed)
Abstract [en]

Human-in-the-loop reinforcement learning (RL) methods actively integrate human knowledge to create reward functions for various robotic tasks. Learning from preferences shows promise as alleviates the requirement of demonstrations by querying humans on state-action sequences. However, the limited granularity of sequence-based approaches complicates temporal credit assignment. The amount of human querying is contingent on query quality, as redundant queries result in excessive human involvement. This paper addresses the often-overlooked aspect of query selection, which is closely related to active learning (AL). We propose a novel query selection approach that leverages variational autoencoder (VAE) representations of state sequences. In this manner, we formulate queries that are diverse in nature while simultaneously taking into account reward model estimations. We compare our approach to the current state-of-the-art query selection methods in preference-based RL, and find ours to be either on-par or more sample efficient through extensive benchmarking on simulated environments relevant to robotics. Lastly, we conduct an online study to verify the effectiveness of our query selection approach with real human feedback and examine several metrics related to human effort.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
National Category
Computer Sciences Robotics and automation
Identifiers
urn:nbn:se:kth:diva-342645 (URN)10.1109/IROS55552.2023.10341795 (DOI)001136907802029 ()2-s2.0-85182523595 (Scopus ID)
Conference
2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023, Detroit, United States of America, Oct 1 2023 - Oct 5 2023
Note

Part of ISBN 978-1-6654-9190-7

QC 20240126

Available from: 2024-01-25 Created: 2024-01-25 Last updated: 2025-05-19Bibliographically approved
van Waveren, S., Pek, C., Tumova, J. & Leite, I. (2022). Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning Policies. In: Proceedings of the 17th ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at Proceedings of the 17th ACM/IEEE International Conference on Human-Robot Interaction, March 7-10, 2022 (pp. 493-501). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning Policies
2022 (English)In: Proceedings of the 17th ACM/IEEE International Conference on Human-Robot Interaction, Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 493-501Conference paper, Published paper (Refereed)
Abstract [en]

Reinforcement learning has shown great potential for learning sequential decision-making tasks. Yet, it is difficult to anticipate all possible real-world scenarios during training, causing robots to inevitably fail in the long run. Many of these failures are due to variations in the robot's environment. Usually experts are called to correct the robot's behavior; however, some of these failures do not necessarily require an expert to solve them. In this work, we query non-experts online for help and explore 1) if/how non-experts can provide feedback to the robot after a failure and 2) how the robot can use this feedback to avoid such failures in the future by generating shields that restrict or correct its high-level actions. We demonstrate our approach on common daily scenarios of a simulated kitchen robot. The results indicate that non-experts can indeed understand and repair robot failures. Our generated shields accelerate learning and improve data-efficiency during retraining.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
ACM IEEE International Conference on Human-Robot Interaction, ISSN 2167-2121
Keywords
robot failure, policy repair, non-experts, shielded reinforcement learning
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-308441 (URN)10.1109/HRI53351.2022.9889604 (DOI)000869793600054 ()2-s2.0-85140707989 (Scopus ID)
Conference
Proceedings of the 17th ACM/IEEE International Conference on Human-Robot Interaction, March 7-10, 2022
Note

Part of proceedings: ISBN 978-1-6654-0731-1

QC 20220215

Available from: 2022-02-07 Created: 2022-02-07 Last updated: 2025-02-09Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-7461-920X

Search in DiVA

Show all publications