Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning
–Neural Information Processing Systems
When decisions are made at high frequency, traditional reinforcement learning (RL) methods struggle to accurately estimate action values. In turn, their performance is inconsistent and often poor. Whether the performance of distributional RL (DRL) agents suffers similarly, however, is unknown. In this work, we establish that DRL agents sensitive to the decision frequency. We prove that action-conditioned return distributions collapse to their underlying policy's return distribution as the decision frequency increases.
Neural Information Processing Systems
Dec-25-2025, 22:48:48 GMT
- Technology: