9308b0d6e5898366a4a986bc33f3d3e7-AuthorFeedback.pdf
–Neural Information Processing Systems
Our finite-sample guarantees forJπ actually hold for any distribution satisfying the14 constraint (see Remark 1) as long as the MDP is linear, and we can also use the simple estimate in Equation (10).15 Moregenerally,18 maximizing entropy is equivalent to minimizing KL-divergence to the uniform distribution. We will add a reference to GenDICE. In the paper, we refer to "RL via Fenchel-Rockafellar Duality" by49 Nachum and Dai (2020), which provides a unified view of the DICE papers, including GenDICE.
Neural Information Processing Systems
Feb-19-2026, 04:31:16 GMT