Worst-Case Regret Bounds for Exploration via Randomized Value Functions

Jun-6-2019–arXiv.org Artificial Intelligence

Exploration is one of the central challenges in reinforcement learning (RL). A large theoretical literature treats exploration in simple finite state and action MDPs, showing that it is possible to efficiently learn a near optimal policy through interaction alone [5, 8, 10, 11, 13-16, 24, 25]. Overwhelmingly, this literature focuses on optimistic algorithms, with most algorithms explicitly maintaining uncertainty sets that are likely to contain the true MDP. It has been difficult to adapt these exploration algorithms to the more complex problems investigated in the applied RL literature. Most applied papers seem to generate exploration through ǫ-greedy or Boltzmann exploration. Those simple methods are compatible with practical value function learning algorithms, which use parametric approximations to value functions to generalize across high dimensional state spaces. Unfortunately, such exploration algorithms can fail catastrophically in simple finite state MDPs [See e.g.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

Jun-6-2019

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England
    - Oxfordshire > Oxford (0.04)
    - Greater London > London (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found