Value Prediction Network

Oh, Junhyuk, Singh, Satinder, Lee, Honglak

Feb-14-2020, 18:28:34 GMT–Neural Information Processing Systems

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation. Papers published at the Neural Information Processing Systems Conference.

model-based rl method, value prediction network

Neural Information Processing Systems

Feb-14-2020, 18:28:34 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.67)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)