Reconciling λ-Returns with Experience Replay

Dec-25-2025, 19:31:54 GMT–Neural Information Processing Systems

Modern deep reinforcement learning methods have departed from the incremental learning required for eligibility traces, rendering the implementation of the λ-return difficult in this context. In particular, off-policy methods that utilize experience replay remain problematic because their random sampling of minibatches is not conducive to the efficient calculation of λ-returns. Yet replay-based methods are often the most sample efficient, and incorporating λ-returns into them is a viable way to achieve new state-of-the-art performance. Towards this, we propose the first method to enable practical use of λ-returns in arbitrary replay-based methods without relying on other forms of decorrelation such as asynchronous gradient updates. By promoting short sequences of past transitions into a small cache within the replay memory, adjacent λ-returns can be efficiently precomputed by sharing Q-values.

electronic proceedings, experience replay, name change, (2 more...)

Neural Information Processing Systems

Dec-25-2025, 19:31:54 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)