Reward from Demonstration in Interactive Reinforcement Learning
Raza, Syed Ali (University of Technology, Sydney) | Johnston, Benjamin (University of Technology, Sydney) | Williams, Mary-Anne (University of Technology, Sydney)
In reinforcement learning (RL), reward shaping is used to show the desirable behavior by assigning positive or negative reward for learner’s preceding action. However, for reward shaping through human-generated rewards, an important aspect is to make it approachable to humans. Typically, a human teacher’s role requires being watchful of agent’s action to assign judgmental feedback based on prior knowledge. It can be a mentally tough and unpleasant exercise especially for lengthy teaching sessions. We present a method, Shaping from Interactive Demonstrations (SfID), which instead of judgmental reward takes action label from human. Therefore, it simplifies the teacher’s role to demonstrating the action to select from a state. We compare SfID with a standard reward shaping approach on Sokoban domain. The results show the competitiveness of SfID with the standard reward shaping.
May-8-2016
- Technology: