Reliability-Adjusted Prioritized Experience Replay
Pleiss, Leonard S., Sutter, Tobias, Schiffer, Maximilian
Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-10 benchmark.
Jul-4-2025
- Country:
- Asia
- China
- Chongqing Province > Chongqing (0.04)
- Sichuan Province > Chengdu (0.04)
- Macao (0.04)
- Middle East > Jordan (0.04)
- China
- Europe > Germany
- Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States
- Massachusetts > Middlesex County > Reading (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Leisure & Entertainment > Games (1.00)
- Technology: