488e4104520c6aab692863cc1dba45af-AuthorFeedback.pdf
–Neural Information Processing Systems
The constraints in the theorem characterize the dual and primal36 quantities (dπ and Qπ), which can be used to estimate policy value, either alone or combined (lines 171-173, with37 a change-of-variableζ = dπ/dD). It is thus a natural starting point for OPE, which we will make explicit in the38 final version.
Neural Information Processing Systems
Feb-8-2026, 07:36:29 GMT