Review for NeurIPS paper: Munchausen Reinforcement Learning

Jan-23-2025, 00:35:15 GMT–Neural Information Processing Systems

In this submission, a new bootstrapping optimization technique is proposed, based on the idea of adding the log-policy to the immediate reward. This is shown to bring strong empirical gains, and the theoretical analysis helps understand why. Although reviewers remained divided even after an active discussion period (7, 7, 5, 5), I believe this is a paper worth publishing at NeurIPS. Simple ideas bringing significant improvements, like this one, are typically those most impactful. I also appreciate the efforts made to better understand the theoretical properties of the proposed algorithm, beyond the basic intuition.

munchausen reinforcement learning, neurips paper

Neural Information Processing Systems

Jan-23-2025, 00:35:15 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)