kth.sePublications KTH
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Interactive Reward Tuning: Interactive Visualization for Preference Elicitation
Aalto University, Finland.
Aalto University, Finland.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0002-1498-9062
Aalto University, Finland.
2024 (English)In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 9254-9261Conference paper, Published paper (Refereed)
Abstract [en]

In reinforcement learning, tuning reward weights in the reward function is necessary to align behavior with user preferences. However, current approaches, which use pairwise comparisons for preference elicitation, are inefficient, because they miss much of the human ability to explore and judge groups of candidate solutions. The paper presents a novel visualization-based approach that better exploits the user's ability to quickly recognize interesting directions for reward tuning. It breaks down the tuning problem by using the visual information-seeking principle: overview first, zoom and filter, then details-on-demand. Following this principle, we built a visualization system comprising two interactively linked views: 1) an embedding view showing a contextual overview of all sampled behaviors and 2) a sample view displaying selected behaviors and visualizations of the detailed time-series data. A user can efficiently explore large sets of samples by iterating between these two views. The paper demonstrates that the proposed approach is capable of tuning rewards for challenging behaviors. The simulation-based evaluation shows that the system can reach optimal solutions with fewer queries relative to baselines.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2024. p. 9254-9261
National Category
Computer Sciences Robotics and automation Human Computer Interaction
Identifiers
URN: urn:nbn:se:kth:diva-359876DOI: 10.1109/IROS58592.2024.10801540ISI: 001433985300209Scopus ID: 2-s2.0-85216462218OAI: oai:DiVA.org:kth-359876DiVA, id: diva2:1937185
Conference
2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024, Abu Dhabi, United Arab Emirates, Oct 14 2024 - Oct 18 2024
Note

Part of ISBN 9798350377705]

QC 20250213

Available from: 2025-02-12 Created: 2025-02-12 Last updated: 2025-05-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Weinkauf, Tino

Search in DiVA

By author/editor
Weinkauf, Tino
By organisation
Computational Science and Technology (CST)
Computer SciencesRobotics and automationHuman Computer Interaction

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 80 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf