On Optimism in Model-Based Reinforcement Learning

Pacchiano, Aldo, Ball, Philip, Parker-Holder, Jack, Choromanski, Krzysztof, Roberts, Stephen

Jun-21-2020–arXiv.org Machine Learning

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL), often coming with strong theoretical guarantees. However, it remains a challenge to scale these approaches to the deep RL paradigm, which has achieved a great deal of attention in recent years. In this paper, we introduce a tractable approach to optimism via noise augmented Markov Decision Processes (MDPs), which we show can obtain a competitive regret bound: $\tilde{\mathcal{O}}( |\mathcal{S}|H\sqrt{|\mathcal{S}||\mathcal{A}| T } )$ when augmenting using Gaussian noise, where $T$ is the total number of environment steps. This tractability allows us to apply our approach to the deep RL setting, where we rigorously evaluate the key factors for success of optimistic model-based RL algorithms, bridging the gap between theory and practice.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

Jun-21-2020

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America
  - United States
    - Wisconsin > Dane County
      - Madison (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - California > Los Angeles County
      - Long Beach (0.14)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Germany > Berlin (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.14)
  - Switzerland > Zürich
    - Zürich (0.14)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - Japan > Honshū
    - Tōhoku > Miyagi Prefecture > Sendai (0.04)
- Africa
  - South Africa (0.04)
  - Ethiopia > Addis Ababa
    - Addis Ababa (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks > Deep Learning (0.67)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found