Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Sensorimotor Robot Policy Training using Reinforcement Learning
KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, perception and learning, RPL.ORCID iD: 0000-0001-6738-9872
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Robots are becoming more ubiquitous in our society and taking over many tasks that were previously considered as human hallmarks. Many of these tasks, e.g., autonomously driving a car, collaborating with humans in dynamic and changing working conditions and performing household chores, require human-level intelligence to perceive the world and to act appropriately. In this thesis, we pursue a different approach compared to classical methods that often construct a robot controller based on the perception-then-action paradigm. We devise robotic action-selection policies by considering action-selection and perception processes as being intertwined, emphasizing that perception comes prior to action and action is key to perception. The main hypothesis is that complex robotic behaviors come as the result of mastering sensorimotor contingencies (SMCs), i.e., regularities between motor actions and associated changes in sensory observations, where SMCs can be seen as building blocks to skillful behaviors. We elaborate and investigate this hypothesis by deliberate design of frameworks which enable policy training merely based on data experienced by a robot,without intervention of human experts for analytical modelings or calibrations. In such circumstances, action policies can be obtained by reinforcement learning (RL) paradigm by making exploratory action decisions and reinforcing patterns of SMCs that lead to reward events for a given task. However, the dimensionality of sensorimotor spaces, complex dynamics of physical tasks, sparseness of reward events, limited amount of data from real-robot experiments, ambiguities of crediting past decisions and safety issues, which arise from exploratory actions of a physical robot, pose challenges to obtain a policy based on data-driven methods alone. In this thesis, we introduce our contributions to deal with the aforementioned issues by devising learning frameworks which endow a robot with the ability to integrate sensorimotor data to obtain action-selection policies. The effectiveness of the proposed frameworks is demonstrated by evaluating the methods on a number of real robotic tasks and illustrating the suitability of the methods to acquire different skills, to make sequential action-decisions in high-dimensional sensorimotor spaces, with limited data and sparse rewards.

Abstract [sv]

Robotar förekommer alltmer i dagens samhälle och tar över många av de uppgifter som tidigare betraktades som tillägnade människor. Flera av dessa uppgifter, som att exempelvis autonomt köra en bil, samarbeta med människor i dynamiska och föränderliga arbetsmiljöer, samt att utföra sysslor i hemmet, kräver mänsklig intelligens för att roboten ska uppfatta världen och agera på lämpligt sätt. I denna avhandling utgår vi ifrån ett annat tillvägagångssätt jämfört med de klassiska metoder för skapande av robotsystem som tidigare ofta byggde på en så kallad perception-then-action paradigm. Vi utformar strategier för val av robotaktioner genom att utgå ifrån att det finns ett önsesidigt beroende mellan perception och aktion, där perception kommer före aktion, samtidigt som aktion är nödvändigt för perception. Huvudhypotesen är att komplexa robotbeteenden kommer som ett resultat av att roboten lär sig bemästra så kallade sensorimotorkopplingar (SMC), dvs regelbundenheter mellan motoriska aktioner och dess motsvarande förändringar i sensoriska observationer, där SMC:ar kan ses som byggblock för komplexa beteenden. Vi utarbetar och undersöker denna hypotes genom att avsiktligt utforma en handfull robotexperiment där en robots kunskaper helt förvärvas utifrån sensorimotoriska data, utan intervention av mänskliga experter för analytisk modellering eller kalibreringar. Under sådana omständigheter är så kallad reinforcement learning (RL) en lämplig paradigm för val av aktioner, en paradigm helt baserad på sensoriska data och utförda motoraktioner, utan krav på handgjorda representationer av världen på hög nivå. Denna paradigm kan utnyttjas för att generera utforskande rörelsemönster och förstärka de sensorimotorkopplingar som leder till framgång för i viss given uppgift. Det finns dock flera faktorer som kompicerar sådan rent datadriven inlärning av beteenden, såsom den sensorimotoriska datans höga dimensionalitet, den fysiska uppgiftens komplexa dynamik, bristen och tvetydigheten i de experiment som leder till positiva utfall, den begränsade mängd experiment som kan göras på en verklig robot och säkerhetsaspekter. De bidrag som introduceras i denna avhandling avser att hantera ovannämnda problem, genom att skapa ramverk för inlärning som gör det möjligt för en robot att integrera sensorimotordata för inlärning av stratieger för val av aktioner. De föreslagna ramverkens effektivitet demonsteras genom att utvärdera metoder på ett antal verkliga robotuppgifter och illustrera metodernas lämplighet för inlärning av olika färdigheter som kräver sekvenser av aktioner utifrån högdimensionell sensorimotorisk data, trots en begränsad mängd experiment med positivt utfall.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2018. , p. 80
Series
TRITA-EECS-AVL ; 2018:47
Keywords [en]
Reinforcement Learning, Artificial Intelligence, Robot Learning, Sensorimotor, Policy Training
National Category
Computer and Information Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-228295ISBN: 978-91-7729-825-0 (print)OAI: oai:DiVA.org:kth-228295DiVA, id: diva2:1208897
Public defence
2018-06-11, F3, Lindstedtsvägen 26, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20180521

Available from: 2018-05-21 Created: 2018-05-21 Last updated: 2018-05-21Bibliographically approved
List of papers
1. Learning visual forward models to compensate for self-induced image motion
Open this publication in new window or tab >>Learning visual forward models to compensate for self-induced image motion
2014 (English)In: 23rd IEEE International Conference on Robot and Human Interactive Communication: IEEE RO-MAN, IEEE , 2014, p. 1110-1115Conference paper, Published paper (Refereed)
Abstract [en]

Predicting the sensory consequences of an agent's own actions is considered an important skill for intelligent behavior. In terms of vision, so-called visual forward models can be applied to learn such predictions. This is no trivial task given the high-dimensionality of sensory data and complex action spaces. In this work, we propose to learn the visual consequences of changes in pan and tilt of a robotic head using a visual forward model based on Gaussian processes and SURF correspondences. This is done without any assumptions on the kinematics of the system or requirements on calibration. The proposed method is compared to an earlier work using accumulator-based correspondences and Radial Basis function networks. We also show the feasibility of the proposed method for detection of independent motion using a moving camera system. By comparing the predicted and actual captured images, image motion due to the robot's own actions and motion caused by moving external objects can be distinguished. Results show the proposed method to be preferable from the earlier method in terms of both prediction errors and ability to detect independent motion.

Place, publisher, year, edition, pages
IEEE, 2014
National Category
Robotics
Identifiers
urn:nbn:se:kth:diva-158120 (URN)10.1109/ROMAN.2014.6926400 (DOI)2-s2.0-84937605949 (Scopus ID)978-1-4799-6763-6 (ISBN)
Conference
23rd IEEE International Conference on Robot and Human Interactive Communication : IEEE RO-MAN : August 25-29, 2014, Edinburgh, Scotland, UK
Note

QC 20150407

Available from: 2014-12-22 Created: 2014-12-22 Last updated: 2018-05-21Bibliographically approved
2. A Sensorimotor Approach for Self-Learning of Hand-Eye Coordination
Open this publication in new window or tab >>A Sensorimotor Approach for Self-Learning of Hand-Eye Coordination
2015 (English)In: IEEE/RSJ International Conference onIntelligent Robots and Systems, Hamburg, September 28 - October 02, 2015, IEEE conference proceedings, 2015, p. 4969-4975Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a sensorimotor contingencies (SMC) based method to fully autonomously learn to perform hand-eye coordination. We divide the task into two visuomotor subtasks, visual fixation and reaching, and implement these on a PR2 robot assuming no prior information on its kinematic model. Our contributions are three-fold: i) grounding a robot in the environment by exploiting SMCs in the action planning system, which eliminates the need for prior knowledge of the kinematic or dynamic models of the robot; ii) using a forward model to search for proper actions to solve the task by minimizing a cost function, instead of training a separate inverse model, to speed up training; iii) encoding 3D spatial positions of a target object based on the robot’s joint positions, thus avoiding calibration with respect to an external coordinate system. The method is capable of learning the task of hand-eye coordination from scratch by less than 20 sensory-motor pairs that are iteratively generated at real-time speed. In order to examine the robustness of the method while dealing with nonlinear image distortions, we apply a so-called retinal mapping image deformation to the input images. Experimental results show the successfulness of the method even under considerable image deformations.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2015
Series
IEEE International Conference on Intelligent Robots and Systems, ISSN 2153-0858
Keywords
Reactive and Sensor-Based Planning, Robot Learning, Visual Servoing
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-179834 (URN)10.1109/IROS.2015.7354076 (DOI)000371885405012 ()2-s2.0-84958153652 (Scopus ID)9781479999941 (ISBN)
Conference
Intelligent Robots and Systems (IROS),Hamburg, September 28 - October 02, 2015
Projects
eSMCs
Note

Qc 20160212

Available from: 2015-12-29 Created: 2015-12-29 Last updated: 2018-05-21Bibliographically approved
3. Self-learning and adaptation in a sensorimotor framework
Open this publication in new window or tab >>Self-learning and adaptation in a sensorimotor framework
2016 (English)In: Proceedings - IEEE International Conference on Robotics and Automation, IEEE conference proceedings, 2016, p. 551-558Conference paper, Published paper (Refereed)
Abstract [en]

We present a general framework to autonomously achieve the task of finding a sequence of actions that result in a desired state. Autonomy is acquired by learning sensorimotor patterns of a robot, while it is interacting with its environment. Gaussian processes (GP) with automatic relevance determination are used to learn the sensorimotor mapping. In this way, relevant sensory and motor components can be systematically found in high-dimensional sensory and motor spaces. We propose an incremental GP learning strategy, which discerns between situations, when an update or an adaptation must be implemented. The Rapidly exploring Random Tree (RRT∗) algorithm is exploited to enable long-term planning and generating a sequence of states that lead to a given goal; while a gradient-based search finds the optimum action to steer to a neighbouring state in a single time step. Our experimental results prove the suitability of the proposed framework to learn a joint space controller with high data dimensions (10×15). It demonstrates short training phase (less than 12 seconds), real-time performance and rapid adaptations capabilities.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2016
National Category
Robotics
Identifiers
urn:nbn:se:kth:diva-197241 (URN)10.1109/ICRA.2016.7487178 (DOI)000389516200069 ()2-s2.0-84977498692 (Scopus ID)9781467380263 (ISBN)
Conference
2016 IEEE International Conference on Robotics and Automation, ICRA 2016, 16 May 2016 through 21 May 2016
Note

QC 20161207

Available from: 2016-12-07 Created: 2016-11-30 Last updated: 2019-08-16Bibliographically approved
4. A sensorimotor reinforcement learning framework for physical human-robot interaction
Open this publication in new window or tab >>A sensorimotor reinforcement learning framework for physical human-robot interaction
Show others...
2016 (English)In: IEEE International Conference on Intelligent Robots and Systems, IEEE, 2016, p. 2682-2688Conference paper, Published paper (Refereed)
Abstract [en]

Modeling of physical human-robot collaborations is generally a challenging problem due to the unpredictive nature of human behavior. To address this issue, we present a data-efficient reinforcement learning framework which enables a robot to learn how to collaborate with a human partner. The robot learns the task from its own sensorimotor experiences in an unsupervised manner. The uncertainty in the interaction is modeled using Gaussian processes (GP) to implement a forward model and an actionvalue function. Optimal action selection given the uncertain GP model is ensured by Bayesian optimization. We apply the framework to a scenario in which a human and a PR2 robot jointly control the ball position on a plank based on vision and force/torque data. Our experimental results show the suitability of the proposed method in terms of fast and data-efficient model learning, optimal action selection under uncertainty and equal role sharing between the partners.

Place, publisher, year, edition, pages
IEEE, 2016
Keywords
Behavioral research, Intelligent robots, Reinforcement learning, Robots, Bayesian optimization, Forward modeling, Gaussian process, Human behaviors, Human-robot collaboration, Model learning, Optimal actions, Physical human-robot interactions, Human robot interaction
National Category
Robotics
Identifiers
urn:nbn:se:kth:diva-202121 (URN)10.1109/IROS.2016.7759417 (DOI)000391921702127 ()2-s2.0-85006367922 (Scopus ID)9781509037629 (ISBN)
Conference
2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2016, 9 October 2016 through 14 October 2016
Note

QC 20170228

Available from: 2017-02-28 Created: 2017-02-28 Last updated: 2019-08-16Bibliographically approved
5. Deep predictive policy training using reinforcement learning
Open this publication in new window or tab >>Deep predictive policy training using reinforcement learning
2017 (English)In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2017, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 2351-2358, article id 8206046Conference paper, Published paper (Refereed)
Abstract [en]

Skilled robot task learning is best implemented by predictive action policies due to the inherent latency of sensorimotor processes. However, training such predictive policies is challenging as it involves finding a trajectory of motor activations for the full duration of the action. We propose a data-efficient deep predictive policy training (DPPT) framework with a deep neural network policy architecture which maps an image observation to a sequence of motor activations. The architecture consists of three sub-networks referred to as the perception, policy and behavior super-layers. The perception and behavior super-layers force an abstraction of visual and motor data trained with synthetic and simulated training samples, respectively. The policy super-layer is a small subnetwork with fewer parameters that maps data in-between the abstracted manifolds. It is trained for each task using methods for policy search reinforcement learning. We demonstrate the suitability of the proposed architecture and learning framework by training predictive policies for skilled object grasping and ball throwing on a PR2 robot. The effectiveness of the method is illustrated by the fact that these tasks are trained using only about 180 real robot attempts with qualitative terminal rewards.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
National Category
Robotics
Identifiers
urn:nbn:se:kth:diva-224269 (URN)10.1109/IROS.2017.8206046 (DOI)2-s2.0-85041944294 (Scopus ID)9781538626825 (ISBN)
Conference
2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2017, Vancouver, Canada, 24 September 2017 through 28 September 2017
Funder
Swedish Research CouncilEU, Horizon 2020
Note

QC 20180315

Available from: 2018-03-15 Created: 2018-03-15 Last updated: 2018-05-21Bibliographically approved

Open Access in DiVA

fulltext(1265 kB)131 downloads
File information
File name FULLTEXT01.pdfFile size 1265 kBChecksum SHA-512
3fa12a417072659b3c3ef9f90316765426f277ddaf8d00dc32342a410b509f8b8b59aca289b5a15b7a4ac46779c139b2ca2003bd277901986140b3a9a8c40acf
Type fulltextMimetype application/pdf

Authority records BETA

Ghadirzadeh, Ali

Search in DiVA

By author/editor
Ghadirzadeh, Ali
By organisation
Robotics, perception and learning, RPL
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 131 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1448 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf