488e4104520c6aab692863cc1dba45af-AuthorFeedback.pdf

Feb-8-2026, 07:36:29 GMT–Neural Information Processing Systems

The constraints in the theorem characterize the dual and primal36 quantities (dπ and Qπ), which can be used to estimate policy value, either alone or combined (lines 171-173, with37 a change-of-variableζ = dπ/dD). It is thus a natural starting point for OPE, which we will make explicit in the38 final version.

algorithm, new method, reviewer, (1 more...)

Neural Information Processing Systems

Feb-8-2026, 07:36:29 GMT

Conferences PDF

Add feedback

Duplicate Docs Excel Report

Title
488e4104520c6aab692863cc1dba45af-AuthorFeedback.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found