Reviews: Imagination-Augmented Agents for Deep Reinforcement Learning

Neural Information Processing Systems 

This paper presents an approach to model-based reinforcement learning where, instead of directly estimating the value of actions in a learned model, a neural network processes the model's predictions, combining with model-free features, to produce a policy and/or value function. The idea is that since the model is likely to be flawed, the network may be able to extract useful information from the model's predictions while ignoring unreliable information. The approach is studied in procedurally generated Sokoban puzzles and a synthetic Pac-Man-like environment and is shown to outperform purely model-free learning as well as MCTS on the learned model. The experiments are thorough and carefully designed to tease issues apart and to clearly answer well-stated questions about the approach. I found the experiments to provide convincing evidence that I2A is taking advantage of the learned model, is robust to model flaws, and can leverage the learned model for multiple tasks.