On Gap-dependent Boundsfor Offline Reinforcement Learning
–Neural Information Processing Systems
Apolicy is -optimalif Suboptimal ( ),V 0 V 0 . Assumptionµco choose policy Aclosely bythe assumption under Assumption (Optimal.
Neural Information Processing Systems
Feb-9-2026, 08:34:25 GMT
- Technology: