A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning

Jan-18-2025, 21:03:28 GMT–Neural Information Processing Systems

Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures wherein the outer-loop meta-learner guides the inner-loop gradient-based reinforcement learner to achieve fast adaptations. In this paper, we develop a unified framework that describes variations of GMRL algorithms and points out that existing stochastic meta-gradient estimators adopted by GMRL are actually \textbf{biased}. We study tabular MDPs empirically and offer quantitative evidence that testifies our theoretical findings on existing stochastic meta-gradient estimators. Furthermore, we conduct experiments on Iterated Prisoner's Dilemma and Atari games to show how other methods such as off-policy learning and low-bias estimator can help fix the gradient bias for GMRL algorithms in general.

gradient bias, meta-reinforcement learning, stochastic meta-gradient estimator, (3 more...)

Neural Information Processing Systems

Jan-18-2025, 21:03:28 GMT

Conferences Web Page

Add feedback

Industry:
- Leisure & Entertainment > Games > Computer Games (0.61)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)