Endre søk
Link to record
Permanent link

Direct link
Publikasjoner (10 av 44) Visa alla publikasjoner
Marta, D., Holk, S., Pek, C. & Leite, I. (2024). SEQUEL: Semi-Supervised Preference-based RL with Query Synthesis via Latent Interpolation. In: 2024 IEEE International Conference on Robotics and Automation (ICRA): . Paper presented at IEEE International Conference on Robotics and Automation (ICRA) (pp. 9585-9592). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>SEQUEL: Semi-Supervised Preference-based RL with Query Synthesis via Latent Interpolation
2024 (engelsk)Inngår i: 2024 IEEE International Conference on Robotics and Automation (ICRA), Institute of Electrical and Electronics Engineers (IEEE) , 2024, s. 9585-9592Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Preference-based reinforcement learning (RL) poses as a recent research direction in robot learning, by allowing humans to teach robots through preferences on pairs of desired behaviours. Nonetheless, to obtain realistic robot policies, an arbitrarily large number of queries is required to be answered by humans. In this work, we approach the sample-efficiency challenge by presenting a technique which synthesizes queries, in a semi-supervised learning perspective. To achieve this, we leverage latent variational autoencoder (VAE) representations of trajectory segments (sequences of state-action pairs). Our approach manages to produce queries which are closely aligned with those labeled by humans, while avoiding excessive uncertainty according to the human preference predictions as determined by reward estimations. Additionally, by introducing variation without deviating from the original human’s intents, more robust reward function representations are achieved. We compare our approach to recent state-of-the-art preference-based RL semi-supervised learning techniques. Our experimental findings reveal that we can enhance the generalization of the estimated reward function without requiring additional human intervention. Lastly, to confirm the practical applicability of our approach, we conduct experiments involving actual human users in a simulated social navigation setting. Videos of the experiments can be found at https://sites.google.com/view/rl-sequel

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2024
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-360978 (URN)10.1109/ICRA57147.2024.10610534 (DOI)001369728000064 ()2-s2.0-85199009127 (Scopus ID)
Konferanse
IEEE International Conference on Robotics and Automation (ICRA)
Merknad

Part of ISBN 9798350384574

QC 20250310

Tilgjengelig fra: 2025-03-07 Laget: 2025-03-07 Sist oppdatert: 2025-03-10bibliografisk kontrollert
Marta, D., Holk, S., Pek, C., Tumova, J. & Leite, I. (2023). Aligning Human Preferences with Baseline Objectives in Reinforcement Learning. In: 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023): . Paper presented at IEEE International Conference on Robotics and Automation (ICRA), MAY 29-JUN 02, 2023, London, ENGLAND. Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>Aligning Human Preferences with Baseline Objectives in Reinforcement Learning
Vise andre…
2023 (engelsk)Inngår i: 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), Institute of Electrical and Electronics Engineers (IEEE) , 2023Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Practical implementations of deep reinforcement learning (deep RL) have been challenging due to an amplitude of factors, such as designing reward functions that cover every possible interaction. To address the heavy burden of robot reward engineering, we aim to leverage subjective human preferences gathered in the context of human-robot interaction, while taking advantage of a baseline reward function when available. By considering baseline objectives to be designed beforehand, we are able to narrow down the policy space, solely requesting human attention when their input matters the most. To allow for control over the optimization of different objectives, our approach contemplates a multi-objective setting. We achieve human-compliant policies by sequentially training an optimal policy from a baseline specification and collecting queries on pairs of trajectories. These policies are obtained by training a reward estimator to generate Pareto optimal policies that include human preferred behaviours. Our approach ensures sample efficiency and we conducted a user study to collect real human preferences, which we utilized to obtain a policy on a social navigation environment.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2023
Serie
IEEE International Conference on Robotics and Automation ICRA
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-324924 (URN)10.1109/ICRA48891.2023.10161261 (DOI)001048371100079 ()2-s2.0-85164820716 (Scopus ID)
Konferanse
IEEE International Conference on Robotics and Automation (ICRA), MAY 29-JUN 02, 2023, London, ENGLAND
Merknad

Part of ISBN 979-8-3503-2365-8

QC 20230328

Tilgjengelig fra: 2023-03-21 Laget: 2023-03-21 Sist oppdatert: 2025-05-19bibliografisk kontrollert
van Waveren, S., Pek, C., Leite, I., Tumova, J. & Kragic, D. (2023). Generating Scenarios from High-Level Specifications for Object Rearrangement Tasks. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023: . Paper presented at 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023, Detroit, United States of America, Oct 1 2023 - Oct 5 2023 (pp. 11420-11427). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>Generating Scenarios from High-Level Specifications for Object Rearrangement Tasks
Vise andre…
2023 (engelsk)Inngår i: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023, s. 11420-11427Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Rearranging objects is an essential skill for robots. To quickly teach robots new rearrangements tasks, we would like to generate training scenarios from high-level specifications that define the relative placement of objects for the task at hand. Ideally, to guide the robot's learning we also want to be able to rank these scenarios according to their difficulty. Prior work has shown how generating diverse scenario from specifications and providing the robot with easy-to-difficult samples can improve the learning. Yet, existing scenario generation methods typically cannot generate diverse scenarios while controlling their difficulty. We address this challenge by conditioning generative models on spatial logic specifications to generate spatially-structured scenarios that meet the specification and desired difficulty level. Our experiments showed that generative models are more effective and data-efficient than rejection sam-pling and that the spatially-structured scenarios can drastically improve training of downstream tasks by orders of magnitude.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2023
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-342642 (URN)10.1109/IROS55552.2023.10341369 (DOI)001136907804123 ()2-s2.0-85182525633 (Scopus ID)
Konferanse
2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023, Detroit, United States of America, Oct 1 2023 - Oct 5 2023
Merknad

Part of ISBN 9781665491907

QC 20240125

Tilgjengelig fra: 2024-01-25 Laget: 2024-01-25 Sist oppdatert: 2025-02-09bibliografisk kontrollert
van Waveren, S., Rudling, R., Leite, I., Jensfelt, P. & Pek, C. (2023). Increasing perceived safety in motion planning for human-drone interaction. In: HRI 2023: Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. Paper presented at 18th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2023, Stockholm, Sweden, Mar 13 2023 - Mar 16 2023 (pp. 446-455). Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>Increasing perceived safety in motion planning for human-drone interaction
Vise andre…
2023 (engelsk)Inngår i: HRI 2023: Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, Association for Computing Machinery (ACM) , 2023, s. 446-455Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Safety is crucial for autonomous drones to operate close to humans. Besides avoiding unwanted or harmful contact, people should also perceive the drone as safe. Existing safe motion planning approaches for autonomous robots, such as drones, have primarily focused on ensuring physical safety, e.g., by imposing constraints on motion planners. However, studies indicate that ensuring physical safety does not necessarily lead to perceived safety. Prior work in Human-Drone Interaction (HDI) shows that factors such as the drone's speed and distance to the human are important for perceived safety. Building on these works, we propose a parameterized control barrier function (CBF) that constrains the drone's maximum deceleration and minimum distance to the human and update its parameters on people's ratings of perceived safety. We describe an implementation and evaluation of our approach. Results of a withinsubject user study (Ng= 15) show that we can improve perceived safety of a drone by adjusting to people individually.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2023
Emneord
control barrier functions, human-drone interaction, motion planning, perceived safety
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-333381 (URN)10.1145/3568162.3576966 (DOI)2-s2.0-85150349732 (Scopus ID)
Konferanse
18th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2023, Stockholm, Sweden, Mar 13 2023 - Mar 16 2023
Merknad

Part of ISBN 9781450399647

QC 20230801

Tilgjengelig fra: 2023-08-01 Laget: 2023-08-01 Sist oppdatert: 2023-08-01bibliografisk kontrollert
Vahs, M., Pek, C. & Tumova, J. (2023). Risk-aware Spatio-temporal Logic Planning in Gaussian Belief Spaces. In: Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation. Paper presented at 2023 IEEE International Conference on Robotics and Automation, ICRA 2023, London, United Kingdom of Great Britain and Northern Ireland, May 29 2023 - Jun 2 2023 (pp. 7879-7885). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>Risk-aware Spatio-temporal Logic Planning in Gaussian Belief Spaces
2023 (engelsk)Inngår i: Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation, Institute of Electrical and Electronics Engineers (IEEE) , 2023, s. 7879-7885Konferansepaper, Publicerat paper (Annet vitenskapelig)
Abstract [en]

In many real-world robotic scenarios, we cannot assume exact knowledge about a robot’s state due to unmodeled dynamics or noisy sensors. Planning in belief space addresses this problem by tightly coupling perception and planning modules to obtain trajectories that take into account the environment’s stochasticity. However, existing works are often limited to tasks such as the classic reach-avoid problem and do not provide risk awareness. We propose a risk-aware planning strategy in belief space that minimizes the risk of violating a given specification and enables a robot to actively gather information about its state. We use Risk Signal Temporal Logic (RiSTL) as a specification language in belief space to express complex spatio-temporal missions including predicates over Gaussian beliefs. We synthesize trajectories for challenging scenarios that cannot be expressed through classical reach-avoid properties and show that risk-aware objectives improve the uncertainty reduction in a robot’s belief.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2023
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-324917 (URN)10.1109/ICRA48891.2023.10160973 (DOI)001048371101031 ()2-s2.0-85168678021 (Scopus ID)
Konferanse
2023 IEEE International Conference on Robotics and Automation, ICRA 2023, London, United Kingdom of Great Britain and Northern Ireland, May 29 2023 - Jun 2 2023
Merknad

Part of ISBN 9798350323658

QC 20230328

Tilgjengelig fra: 2023-03-20 Laget: 2023-03-20 Sist oppdatert: 2025-02-09bibliografisk kontrollert
Mitsioni, I., Tajvar, P., Kragic, D., Tumova, J. & Pek, C. (2023). Safe Data-Driven Model Predictive Control of Systems with Complex Dynamics. IEEE Transactions on robotics, 39(4), 3242-3258
Åpne denne publikasjonen i ny fane eller vindu >>Safe Data-Driven Model Predictive Control of Systems with Complex Dynamics
Vise andre…
2023 (engelsk)Inngår i: IEEE Transactions on robotics, ISSN 1552-3098, E-ISSN 1941-0468, Vol. 39, nr 4, s. 3242-3258Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

In this article, we address the task and safety performance of data-driven model predictive controllers (DD-MPC) for systems with complex dynamics, i.e., temporally or spatially varying dynamics that may also be discontinuous. The three challenges we focus on are the accuracy of learned models, the receding horizon-induced myopic predictions of DD-MPC, and the active encouragement of safety. To learn accurate models for DD-MPC, we cautiously, yet effectively, explore the dynamical system with rapidly exploring random trees (RRT) to collect a uniform distribution of samples in the state-input space and overcome the common distribution shift in model learning. The learned model is further used to construct an RRT tree that estimates how close the model's predictions are to the desired target. This information is used in the cost function of the DD-MPC to minimize the short-sighted effect of its receding horizon nature. To promote safety, we approximate sets of safe states using demonstrations of exclusively safe trajectories, i.e., without unsafe examples, and encourage the controller to generate trajectories close to the sets. As a running example, we use a broken version of an inverted pendulum where the friction abruptly changes in certain regions. Furthermore, we showcase the adaptation of our method to a real-world robotic application with complex dynamics: robotic food-cutting. Our results show that our proposed control framework effectively avoids unsafe states with higher success rates than baseline controllers that employ models from controlled demonstrations and even random actions.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2023
Emneord
Data-driven modelling, dimensionality reduction, formal specifications, predictive control, sampling methods
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-349873 (URN)10.1109/TRO.2023.3266995 (DOI)000986623300001 ()2-s2.0-85159813435 (Scopus ID)
Merknad

QC 20240704

Tilgjengelig fra: 2024-07-04 Laget: 2024-07-04 Sist oppdatert: 2025-02-09bibliografisk kontrollert
Pek, C., Schuppe, G. F., Esposito, F., Tumova, J. & Kragic, D. (2023). SpaTiaL: monitoring and planning of robotic tasks using spatio-temporal logic specifications. Autonomous Robots, 47(8), 1439-1462
Åpne denne publikasjonen i ny fane eller vindu >>SpaTiaL: monitoring and planning of robotic tasks using spatio-temporal logic specifications
Vise andre…
2023 (engelsk)Inngår i: Autonomous Robots, ISSN 0929-5593, E-ISSN 1573-7527, Vol. 47, nr 8, s. 1439-1462Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Many tasks require robots to manipulate objects while satisfying a complex interplay of spatial and temporal constraints. For instance, a table setting robot first needs to place a mug and then fill it with coffee, while satisfying spatial relations such as forks need to placed left of plates. We propose the spatio-temporal framework SpaTiaL that unifies the specification, monitoring, and planning of object-oriented robotic tasks in a robot-agnostic fashion. SpaTiaL is able to specify diverse spatial relations between objects and temporal task patterns. Our experiments with recorded data, simulations, and real robots demonstrate how SpaTiaL provides real-time monitoring and facilitates online planning. SpaTiaL is open source and easily expandable to new object relations and robotic applications.

sted, utgiver, år, opplag, sider
Springer Nature, 2023
Emneord
Monitoring, Object-centric planning, Spatio-temporal logics, Task and motion planning
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-348211 (URN)10.1007/s10514-023-10145-1 (DOI)001092898500001 ()2-s2.0-85175646748 (Scopus ID)
Merknad

QC 20240624

Tilgjengelig fra: 2024-06-24 Laget: 2024-06-24 Sist oppdatert: 2025-02-09bibliografisk kontrollert
Marta, D., Holk, S., Pek, C., Tumova, J. & Leite, I. (2023). VARIQuery: VAE Segment-based Active Learning for Query Selection in Preference-based Reinforcement Learning. In: : . Paper presented at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023, October 1 – 5, 2023, Detroit, Michigan, USA..
Åpne denne publikasjonen i ny fane eller vindu >>VARIQuery: VAE Segment-based Active Learning for Query Selection in Preference-based Reinforcement Learning
Vise andre…
2023 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Human-in-the-loop reinforcement learning (RL) methods actively integrate human knowledge to create reward functions for various robotic tasks. Learning from preferences shows promise as alleviates the requirement of demonstrations by querying humans on state-action sequences. However, the limited granularity of sequence-based approaches complicates temporal credit assignment. The amount of human querying is contingent on query quality, as redundant queries result in excessive human involvement. This paper addresses the often-overlooked aspect of query selection, which is closely related to active learning (AL). We propose a novel query selection approach that leverages variational autoencoder (VAE) representations of state sequences. In this manner, we formulate queries that are diverse in nature while simultaneously taking into account reward model estimations. We compare our approach to the current state-of-the-art query selection methods in preference-based RL, and find ours to be either on-par or more sample efficient through extensive benchmarking on simulated environments relevant to robotics. Lastly, we conduct an online study to verify the effectiveness of our query selection approach with real human feedback and examine several metrics related to human effort.

HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-333948 (URN)
Konferanse
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023, October 1 – 5, 2023, Detroit, Michigan, USA.
Merknad

QC 20230818

Tilgjengelig fra: 2023-08-15 Laget: 2023-08-15 Sist oppdatert: 2025-02-09bibliografisk kontrollert
Marta, D., Holk, S., Pek, C., Tumova, J. & Leite, I. (2023). VARIQuery: VAE Segment-Based Active Learning for Query Selection in Preference-Based Reinforcement Learning. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023: . Paper presented at 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023, Detroit, United States of America, Oct 1 2023 - Oct 5 2023 (pp. 7878-7885). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>VARIQuery: VAE Segment-Based Active Learning for Query Selection in Preference-Based Reinforcement Learning
Vise andre…
2023 (engelsk)Inngår i: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023, s. 7878-7885Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Human-in-the-loop reinforcement learning (RL) methods actively integrate human knowledge to create reward functions for various robotic tasks. Learning from preferences shows promise as alleviates the requirement of demonstrations by querying humans on state-action sequences. However, the limited granularity of sequence-based approaches complicates temporal credit assignment. The amount of human querying is contingent on query quality, as redundant queries result in excessive human involvement. This paper addresses the often-overlooked aspect of query selection, which is closely related to active learning (AL). We propose a novel query selection approach that leverages variational autoencoder (VAE) representations of state sequences. In this manner, we formulate queries that are diverse in nature while simultaneously taking into account reward model estimations. We compare our approach to the current state-of-the-art query selection methods in preference-based RL, and find ours to be either on-par or more sample efficient through extensive benchmarking on simulated environments relevant to robotics. Lastly, we conduct an online study to verify the effectiveness of our query selection approach with real human feedback and examine several metrics related to human effort.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2023
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-342645 (URN)10.1109/IROS55552.2023.10341795 (DOI)001136907802029 ()2-s2.0-85182523595 (Scopus ID)
Konferanse
2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023, Detroit, United States of America, Oct 1 2023 - Oct 5 2023
Merknad

Part of ISBN 978-1-6654-9190-7

QC 20240126

Tilgjengelig fra: 2024-01-25 Laget: 2024-01-25 Sist oppdatert: 2025-05-19bibliografisk kontrollert
van Waveren, S., Pek, C., Tumova, J. & Leite, I. (2022). Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning Policies. In: Proceedings of the 17th ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at Proceedings of the 17th ACM/IEEE International Conference on Human-Robot Interaction, March 7-10, 2022 (pp. 493-501). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning Policies
2022 (engelsk)Inngår i: Proceedings of the 17th ACM/IEEE International Conference on Human-Robot Interaction, Institute of Electrical and Electronics Engineers (IEEE) , 2022, s. 493-501Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Reinforcement learning has shown great potential for learning sequential decision-making tasks. Yet, it is difficult to anticipate all possible real-world scenarios during training, causing robots to inevitably fail in the long run. Many of these failures are due to variations in the robot's environment. Usually experts are called to correct the robot's behavior; however, some of these failures do not necessarily require an expert to solve them. In this work, we query non-experts online for help and explore 1) if/how non-experts can provide feedback to the robot after a failure and 2) how the robot can use this feedback to avoid such failures in the future by generating shields that restrict or correct its high-level actions. We demonstrate our approach on common daily scenarios of a simulated kitchen robot. The results indicate that non-experts can indeed understand and repair robot failures. Our generated shields accelerate learning and improve data-efficiency during retraining.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2022
Serie
ACM IEEE International Conference on Human-Robot Interaction, ISSN 2167-2121
Emneord
robot failure, policy repair, non-experts, shielded reinforcement learning
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-308441 (URN)10.1109/HRI53351.2022.9889604 (DOI)000869793600054 ()2-s2.0-85140707989 (Scopus ID)
Konferanse
Proceedings of the 17th ACM/IEEE International Conference on Human-Robot Interaction, March 7-10, 2022
Merknad

Part of proceedings: ISBN 978-1-6654-0731-1

QC 20220215

Tilgjengelig fra: 2022-02-07 Laget: 2022-02-07 Sist oppdatert: 2025-02-09bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0001-7461-920X