Reliability-Adjusted Prioritized Experience Replay

Pleiss, Leonard S., Sutter, Tobias, Schiffer, Maximilian

Jul-4-2025–arXiv.org Machine Learning

Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-10 benchmark.

machine learning, reinforcement learning, transition, (17 more...)

arXiv.org Machine Learning

Jul-4-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China
    - Chongqing Province > Chongqing (0.04)
    - Sichuan Province > Chengdu (0.04)
  - Macao (0.04)
  - Middle East > Jordan (0.04)
- Europe > Germany
  - Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States
  - Massachusetts > Middlesex County > Reading (0.04)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)