MinimaxValueIntervalforOff-PolicyEvaluation andPolicyOptimization

Neural Information Processing Systems 

FunctionApproximation Throughout thepaper,weassume access totwofunction classesQ (S A R)andW (S A R). Todevelop intuition, theyare supposed to modelQπ and wπ/µ, respectively, though most of our main results are stated without assuming any kind of realizability.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found