Operator Augmentation for Model-based Policy Evaluation

Oct-25-2021–arXiv.org Machine Learning

Reinforcement learning (RL) has received much attention following recent successes, such as AlphaGo and AlphaZero [25, 26]. One of the fundamental problems of RL is policy evaluation [29]. When the transition dynamics are unknown, one learns the dynamics model from observed data in model-based RL. However, even if the learned model is an unbiased estimate of the true dynamics, the policy evaluation under the learned model is biased. The question of interest in this paper is whether one can increase the accuracy of the policy evaluation given an estimated dynamics model. We consider a discounted Markov decision process (MDP) M (S, A, P, r, γ) with discrete state space S and discrete action space A. S and A are used to denote the size of S and A, respectively.

artificial intelligence, diag, machine learning, (16 more...)

arXiv.org Machine Learning

Oct-25-2021

arXiv.org PDF

Add feedback

Country:
- Europe > Austria (0.14)
- North America > United States
  - California > Santa Clara County (0.14)

Genre:
- Research Report (0.50)

Industry:
- Leisure & Entertainment > Games > Go (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Learning Graphical Models (0.34)
  - Representation & Reasoning (1.00)