The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

Patel, Nishil, Lee, Sebastian, Mannelli, Stefano Sarao, Goldt, Sebastian, Saxe, Andrew

Sep-2-2023–arXiv.org Artificial Intelligence

Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle real-world domains, these systems often use neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, much theory of RL has focused on discrete state spaces or worst-case analysis, and fundamental questions remain about the dynamics of policy learning in high-dimensional settings. Here, we propose a solvable high-dimensional model of RL that can capture a variety of learning protocols, and derive its typical dynamics as a set of closed-form ordinary differential equations (ODEs). We derive optimal schedules for the learning rates and task difficulty - analogous to annealing schemes and curricula during training in RL - and show that the model exhibits rich behaviour, including delayed learning under sparse rewards; a variety of learning regimes depending on reward baselines; and a speed-accuracy trade-off driven by reward stringency. Experiments on variants of the Procgen game "Bossfight" and Arcade Learning Environment game "Pong" also show such a speed-accuracy trade-off in practice. Together, these results take a step towards closing the gap between theory and practice in high-dimensional RL.

agent, annual conference, proceedings, (12 more...)

arXiv.org Artificial Intelligence

Sep-2-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America
  - United States
    - Nevada (0.04)
    - New York (0.04)
    - Maryland > Baltimore (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California > Los Angeles County
      - Long Beach (0.04)
  - Puerto Rico > San Juan
    - San Juan (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Italy
    - Sicily > Palermo (0.04)
    - Friuli Venezia Giulia > Trieste Province
      - Trieste (0.04)
  - Austria > Styria
    - Graz (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - Japan > Honshū
    - Chūbu > Nagano Prefecture > Nagano (0.04)

Genre:
- Research Report (0.82)

Industry:
- Education (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Reinforcement Learning (1.00)
  - Neural Networks > Perceptrons (0.42)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found