Simplifying Reward Design through Divide-and-Conquer

Ratner, Ellis, Hadfield-Menell, Dylan, Dragan, Anca D.

arXiv.org Artificial Intelligence 

While significant advances have been made in planning and reinforcement learning for robots, these algorithms require access to a reward (or cost) function in order to be successful. Unfortunately, designing a good reward function by hand remains challenging in many tasks. When designing the reward, the goal is to choose a function that guides the robot to accomplish the task in any potential test environment that it might encounter. Typically, the designer considers a representative set of training environments, and finds a reward function that induces desirable behavior across all of them, as in Figure 1 (Top). In practice, this can be both challenging and frustrating for the reward designer. The process often results in many iterations of tuning, whereby changing the reward function corrects the behavior in one environment, but breaks it in another, and so on. We posit that designing a good reward function for a single environment at a time is easier than designing one for all training environments in consideration simultaneously. Imagine the task of motion planning in the home. The reward function provided to the planner must correctly encode the desired tradeoffs: the robot must stay away from static objects, it should give wider berth to fragile objects (as in Figure 1 (Bottom)), and it needs to keep a comfortable distance from the person, prioritizing more sensitive areas, such as the head [9].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found