9308b0d6e5898366a4a986bc33f3d3e7-AuthorFeedback.pdf

Feb-19-2026, 04:31:16 GMT–Neural Information Processing Systems

Our finite-sample guarantees forJπ actually hold for any distribution satisfying the14 constraint (see Remark 1) as long as the MDP is linear, and we can also use the simple estimate in Equation (10).15 Moregenerally,18 maximizing entropy is equivalent to minimizing KL-divergence to the uniform distribution. We will add a reference to GenDICE. In the paper, we refer to "RL via Fenchel-Rockafellar Duality" by49 Nachum and Dai (2020), which provides a unified view of the DICE papers, including GenDICE.

divergence, entropy, gendice, (1 more...)

Neural Information Processing Systems

Feb-19-2026, 04:31:16 GMT

Conferences PDF

Add feedback

Duplicate Docs Excel Report

Title
about the assumptions, related work, and evaluation. 2 CONTENT

Similar Docs Excel Report more

Title	Similarity	Source
None found