Policy-Aware Model Learning for Policy Gradient Methods

Abachi, Romina, Ghavamzadeh, Mohammad, Farahmand, Amir-massoud

Feb-28-2020–arXiv.org Artificial Intelligence

A model-based reinforcement learning (MBRL) agent gradually learns a model of the environment as it interacts with it, and uses the learned model to plan and find a good policy. This can be done by planning with samples coming from the model, instead of or in addition to the samples from the environment, e.g., Sutton (1990); Peng & Williams (1993); Sutton et al. (2008); Deisenroth et al. (2015); Talvitie (2017); Ha & Schmidhuber (2018). If learning a model is easier than learning the policy or value function in a model-free manner, MBRL will lead to a reduction in the number of required interactions with the real-world and will improve the sample complexity of the agent. However, this is contingent on the ability of the agent to learn an accurate model of the real environment. Therefore, the problem of learning a good model of the environment is of paramount importance in the success of MBRL. This paper addresses the question of how we can approach the problem of learning a model of the environment, and proposes a method called policy-aware model learning (PAML). The conventional approach to model learning in MBRL is to learn a model that is a good predictor of the environment. If the learned model is accurate enough, this leads to a value function or a policy that is close to the optimal one. Learning a good predictive model can be achieved by minimizing some form of a probabilistic loss.

dimension, paml, value function, (14 more...)

arXiv.org Artificial Intelligence

Feb-28-2020

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Massachusetts
    - Middlesex County > Cambridge (0.04)
  - Canada > Ontario
    - Toronto (0.14)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found