MinimaxValueIntervalforOff-PolicyEvaluation andPolicyOptimization
–Neural Information Processing Systems
FunctionApproximation Throughout thepaper,weassume access totwofunction classesQ (S A R)andW (S A R). Todevelop intuition, theyare supposed to modelQπ and wπ/µ, respectively, though most of our main results are stated without assuming any kind of realizability.
Neural Information Processing Systems
Feb-7-2026, 17:04:19 GMT
- Country:
- Technology: