Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies

Cowan, Wesley, Katehakis, Michael N., Pirutinsky, Daniel

Sep-12-2019–arXiv.org Artificial Intelligence

In this paper we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area and we compare the performance of the classic UCB policy of \cc{bkmdp97} with a new policy developed herein which we call MDP-Deterministic Minimum Empirical Divergence (MDP-DMED), and a method based on Posterior sampling (MDP-PS).

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

Sep-12-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.68)

Genre:
- Overview (0.89)
- Research Report (0.83)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.87)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found