Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL

Dec-26-2025, 16:06:34 GMT–Neural Information Processing Systems

The divergence of the Q-value estimation has been a prominent issue offline reinforcement learning (offline RL), where the agent has no access to real dynamics. Traditional beliefs attribute this instability to querying out-of-distribution actions when bootstrapping value targets. Though this issue can be alleviated with policy constraints or conservative Q estimation, a theoretical understanding of the underlying mechanism causing the divergence has been absent. In this work, we aim to thoroughly comprehend this mechanism and attain an improved solution. We first identify a fundamental pattern, \emph{self-excitation}, as the primary cause of Q-value estimation divergence in offline RL.

better resolving q-value divergence, name change, offline-rl, (4 more...)

Neural Information Processing Systems

Dec-26-2025, 16:06:34 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)