Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning

Mar-20-2026, 16:05:20 GMT–Neural Information Processing Systems

When decisions are made at high frequency, traditional reinforcement learning (RL) methods struggle to accurately estimate action values. In turn, their performance is inconsistent and often poor. Whether the performance of distributional RL (DRL) agents suffers similarly, however, is unknown. In this work, we establish that DRL agents sensitive to the decision frequency. We prove that action-conditioned return distributions collapse to their underlying policy's return distribution as the decision frequency increases.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

Mar-20-2026, 16:05:20 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)