Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate Fan-Ming Luo 1,2 Zuolin Tu

Neural Information Processing Systems 

Recent progress has demonstrated that recurrent reinforcement learning (RL), which consists of a context encoder based on recurrent neural networks (RNNs) for unobservable state prediction and a multilayer perceptron (MLP) policy for decision making, can mitigate partial observability and serve as a robust baseline for POMDP tasks.