Review for NeurIPS paper: Munchausen Reinforcement Learning
–Neural Information Processing Systems
In this submission, a new bootstrapping optimization technique is proposed, based on the idea of adding the log-policy to the immediate reward. This is shown to bring strong empirical gains, and the theoretical analysis helps understand why. Although reviewers remained divided even after an active discussion period (7, 7, 5, 5), I believe this is a paper worth publishing at NeurIPS. Simple ideas bringing significant improvements, like this one, are typically those most impactful. I also appreciate the efforts made to better understand the theoretical properties of the proposed algorithm, beyond the basic intuition.
Neural Information Processing Systems
Jan-23-2025, 00:35:15 GMT
- Technology: