488e4104520c6aab692863cc1dba45af-AuthorFeedback.pdf

Neural Information Processing Systems 

The constraints in the theorem characterize the dual and primal36 quantities (dπ and Qπ), which can be used to estimate policy value, either alone or combined (lines 171-173, with37 a change-of-variableζ = dπ/dD). It is thus a natural starting point for OPE, which we will make explicit in the38 final version.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found