Author Contributions
–Neural Information Processing Systems
A.1 Deriving the Optimum of the KL-Constrained Reward Maximization Objective In this appendix, we will derive Eq. 4. Analogously to Eq. 3, we optimize the following objective: max
Neural Information Processing Systems
Oct-9-2025, 04:01:13 GMT