We will add a series of nu-2 merical experiments to demonstrate the minimax optimality of the model-3
–Neural Information Processing Systems
We thank all reviewers for very helpful comments. This letter addresses several major questions raised by the reviewers. Indeed, reward perturbation is introduced merely to facilitate analysis. Take Section 4.3 of the Arxiv version We will elucidate the motivation and intuition of reward perturbation earlier on in the revised paper. We understand from the reviewer's comment that there might be confusion in our This will be made clear in the final paper.
Neural Information Processing Systems
Aug-22-2025, 00:28:22 GMT
- Technology: