Program Synthesis Guided Reinforcement Learning
Yang, Yichen, Inala, Jeevana Priya, Bastani, Osbert, Pu, Yewen, Solar-Lezama, Armando, Rinard, Martin
–arXiv.org Artificial Intelligence
A key challenge for reinforcement learning is solving long-horizon planning and control problems. Recent work has proposed leveraging programs to help guide the learning algorithm in these settings. However, these approaches impose a high manual burden on the user since they must provide a guiding program for every new task they seek to achieve. We propose an approach that leverages program synthesis to automatically generate the guiding program. A key challenge is how to handle partially observable environments. We propose model predictive program synthesis, which trains a generative model to predict the unobserved portions of the world, and then synthesizes a program based on samples from this model in a way that is robust to its uncertainty. We evaluate our approach on a set of challenging benchmarks, including a 2D Minecraft-inspired ``craft'' environment where the agent must perform a complex sequence of subtasks to achieve its goal, a box-world environment that requires abstract reasoning, and a variant of the craft environment where the agent is a MuJoCo Ant. Our approach significantly outperforms several baselines, and performs essentially as well as an oracle that is given an effective program.
arXiv.org Artificial Intelligence
Feb-22-2021
- Country:
- North America > United States (0.68)
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment (0.66)
- Technology: