Time-Efficient Reinforcement Learning with Stochastic Stateful Policies

Al-Hafez, Firas, Zhao, Guoping, Peters, Jan, Tateo, Davide

Nov-7-2023–arXiv.org Artificial Intelligence

Stateful policies play an important role in reinforcement learning, such as handling partially observable environments, enhancing robustness, or imposing an inductive bias directly into the policy structure. The conventional method for training stateful policies is Backpropagation Through Time (BPTT), which comes with significant drawbacks, such as slow training due to sequential gradient propagation and the occurrence of vanishing or exploding gradients. The gradient is often truncated to address these issues, resulting in a biased policy update. We present a novel approach for training stateful policies by decomposing the latter into a stochastic internal state kernel and a stateless policy, jointly optimized by following the stateful policy gradient. We introduce different versions of the stateful policy gradient theorem, enabling us to easily instantiate stateful variants of popular reinforcement learning and imitation learning algorithms. Furthermore, we provide a theoretical analysis of our new gradient estimator and compare it with BPTT. We evaluate our approach on complex continuous control tasks, e.g., humanoid locomotion, and demonstrate that our gradient estimator scales effectively with task complexity while offering a faster and simpler alternative to BPTT.

algorithm, gradient, variance, (17 more...)

arXiv.org Artificial Intelligence

Nov-7-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
  - Queensland > Brisbane (0.04)
- North America
  - United States
    - Maryland > Baltimore (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Colorado > Denver County
      - Denver (0.04)
  - Puerto Rico > San Juan
    - San Juan (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia (0.04)
- Europe
  - United Kingdom > England
    - Lincolnshire > Lincoln (0.04)
    - Greater London > London (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Spain
    - Catalonia > Barcelona Province
      - Barcelona (0.04)
    - Andalusia > Granada Province
      - Granada (0.04)
  - Germany > Hesse
    - Darmstadt Region > Darmstadt (0.04)
  - France > Hauts-de-France
    - Nord > Lille (0.04)
- Asia
  - Middle East > Qatar
    - Ad-Dawhah > Doha (0.04)
  - China > Beijing
    - Beijing (0.04)
- Africa > Rwanda
  - Kigali > Kigali (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks > Deep Learning (0.46)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found