On Optimism in Model-Based Reinforcement Learning
Pacchiano, Aldo, Ball, Philip, Parker-Holder, Jack, Choromanski, Krzysztof, Roberts, Stephen
The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL), often coming with strong theoretical guarantees. However, it remains a challenge to scale these approaches to the deep RL paradigm, which has achieved a great deal of attention in recent years. In this paper, we introduce a tractable approach to optimism via noise augmented Markov Decision Processes (MDPs), which we show can obtain a competitive regret bound: $\tilde{\mathcal{O}}( |\mathcal{S}|H\sqrt{|\mathcal{S}||\mathcal{A}| T } )$ when augmenting using Gaussian noise, where $T$ is the total number of environment steps. This tractability allows us to apply our approach to the deep RL setting, where we rigorously evaluate the key factors for success of optimistic model-based RL algorithms, bridging the gap between theory and practice.
Jun-21-2020
- Country:
- Africa
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- South Africa (0.04)
- Ethiopia > Addis Ababa
- Asia
- Japan > Honshū
- Tōhoku > Miyagi Prefecture > Sendai (0.04)
- Middle East > Jordan (0.04)
- Japan > Honshū
- Europe
- Germany > Berlin (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- United Kingdom > England
- Oxfordshire > Oxford (0.14)
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- United States
- California > Los Angeles County
- Long Beach (0.14)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- California > Los Angeles County
- Canada
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Africa
- Genre:
- Research Report (1.00)
- Technology: