On Optimism in Model-Based Reinforcement Learning
Pacchiano, Aldo, Ball, Philip, Parker-Holder, Jack, Choromanski, Krzysztof, Roberts, Stephen
The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL), often coming with strong theoretical guarantees. However, it remains a challenge to scale these approaches to the deep RL paradigm, which has achieved a great deal of attention in recent years. In this paper, we introduce a tractable approach to optimism via noise augmented Markov Decision Processes (MDPs), which we show can obtain a competitive regret bound: $\tilde{\mathcal{O}}( |\mathcal{S}|H\sqrt{|\mathcal{S}||\mathcal{A}| T } )$ when augmenting using Gaussian noise, where $T$ is the total number of environment steps. This tractability allows us to apply our approach to the deep RL setting, where we rigorously evaluate the key factors for success of optimistic model-based RL algorithms, bridging the gap between theory and practice.
Jun-21-2020
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- United States
- Wisconsin > Dane County
- Madison (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California > Los Angeles County
- Long Beach (0.14)
- Wisconsin > Dane County
- Canada
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Europe
- Germany > Berlin (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.14)
- Switzerland > Zürich
- Zürich (0.14)
- Sweden > Stockholm
- Stockholm (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Asia
- Middle East > Jordan (0.04)
- Japan > Honshū
- Tōhoku > Miyagi Prefecture > Sendai (0.04)
- Africa
- South Africa (0.04)
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Oceania > Australia
- Genre:
- Research Report (1.00)
- Technology: