Author Contributions
–Neural Information Processing Systems
A.1 Deriving the Optimum of the KL-Constrained Reward Maximization Objective In this appendix, we will derive Eq. 4. Analogously to Eq. 3, we optimize the following objective: max
Neural Information Processing Systems
Feb-16-2026, 09:56:10 GMT