Reconciling λ-Returns with Experience Replay
–Neural Information Processing Systems
Modern deep reinforcement learning methods have departed from the incremental learning required for eligibility traces, rendering the implementation of the λ-return difficult in this context. In particular, off-policy methods that utilize experience replay remain problematic because their random sampling of minibatches is not conducive to the efficient calculation of λ-returns. Yet replay-based methods are often the most sample efficient, and incorporating λ-returns into them is a viable way to achieve new state-of-the-art performance. Towards this, we propose the first method to enable practical use of λ-returns in arbitrary replay-based methods without relying on other forms of decorrelation such as asynchronous gradient updates. By promoting short sequences of past transitions into a small cache within the replay memory, adjacent λ-returns can be efficiently precomputed by sharing Q-values.
Neural Information Processing Systems
Dec-25-2025, 19:31:54 GMT
- Technology: