Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate Fan-Ming Luo 1,2 Zuolin Tu
–Neural Information Processing Systems
Recent progress has demonstrated that recurrent reinforcement learning (RL), which consists of a context encoder based on recurrent neural networks (RNNs) for unobservable state prediction and a multilayer perceptron (MLP) policy for decision making, can mitigate partial observability and serve as a robust baseline for POMDP tasks.
Neural Information Processing Systems
Oct-10-2025, 03:06:44 GMT
- Country:
- Africa
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Rwanda > Kigali
- Kigali (0.04)
- Ethiopia > Addis Ababa
- Asia
- China > Jiangsu Province
- Nanjing (0.04)
- Middle East > Jordan (0.04)
- China > Jiangsu Province
- Europe
- North America
- Canada > Quebec
- Montreal (0.04)
- Puerto Rico > San Juan
- San Juan (0.04)
- United States
- California > Los Angeles County
- Long Beach (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Maryland > Baltimore (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Michigan > Wayne County
- Detroit (0.04)
- California > Los Angeles County
- Canada > Quebec
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Queensland > Brisbane (0.04)
- Africa
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (1.00)
- Research Report
- Industry:
- Information Technology (0.92)
- Technology: