The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions
Patel, Nishil, Lee, Sebastian, Mannelli, Stefano Sarao, Goldt, Sebastian, Saxe, Andrew
–arXiv.org Artificial Intelligence
Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle real-world domains, these systems often use neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, much theory of RL has focused on discrete state spaces or worst-case analysis, and fundamental questions remain about the dynamics of policy learning in high-dimensional settings. Here, we propose a solvable high-dimensional model of RL that can capture a variety of learning protocols, and derive its typical dynamics as a set of closed-form ordinary differential equations (ODEs). We derive optimal schedules for the learning rates and task difficulty - analogous to annealing schemes and curricula during training in RL - and show that the model exhibits rich behaviour, including delayed learning under sparse rewards; a variety of learning regimes depending on reward baselines; and a speed-accuracy trade-off driven by reward stringency. Experiments on variants of the Procgen game "Bossfight" and Arcade Learning Environment game "Pong" also show such a speed-accuracy trade-off in practice. Together, these results take a step towards closing the gap between theory and practice in high-dimensional RL.
arXiv.org Artificial Intelligence
Sep-2-2023
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- United States
- Nevada (0.04)
- New York (0.04)
- Maryland > Baltimore (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Los Angeles County
- Long Beach (0.04)
- Puerto Rico > San Juan
- San Juan (0.04)
- Canada
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Italy
- Sicily > Palermo (0.04)
- Friuli Venezia Giulia > Trieste Province
- Trieste (0.04)
- Austria > Styria
- Graz (0.04)
- United Kingdom > England
- Asia
- Middle East > Jordan (0.04)
- Japan > Honshū
- Chūbu > Nagano Prefecture > Nagano (0.04)
- Oceania > Australia
- Genre:
- Research Report (0.82)
- Industry:
- Education (1.00)
- Technology: