Reinforcement Learning under Model Mismatch

Roy, Aurko, Xu, Huan, Pokutta, Sebastian

Nov-8-2017–arXiv.org Machine Learning

Reinforcement learning is concerned with learning a good policy for sequential decision making problems modeled as a Markov Decision Process (MDP), via interacting with the environment [22, 20]. In this work we address the problem of reinforcement learning from a misspecified model. As a motivating example, consider the scenario where the problem of interest is not directly accessible, but instead the agent can interact with a simulator whose dynamics is reasonably close to the true problem. Another plausible application is when the parameters of the model may evolve over time but can still be reasonably approximated by an MDP. To address this problem we use the framework of robust MDPs which was proposed by [2, 17, 13] to solve the planning problem under model misspecification. The robust MDP framework considers a class of models and finds the robust optimal policy which is a policy that performs best under the worst model. It was shown by [2, 17, 13] that the robust optimal policy satisfies the robust Bellman equation which naturally leads to exact dynamic programming algorithms to find an optimal policy. However, this approach is model dependent and does not immediately generalize to the model-free case where the parameters of the model are unknown. Essentially, reinforcement learning is a model-free framework to solve the Bellman equation using samples.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

Nov-8-2017

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found