Review for NeurIPS paper: Near-Optimal Reinforcement Learning with Self-Play

Jan-22-2025, 01:41:51 GMT–Neural Information Processing Systems

Additional Feedback: *) Is there a reason to mention algorithm 1? it seems algorithm 2 gives improved performance relatively to it. If so, why presenting the two algorithms and not just algorithm 2? *) Although equation 9 can be thought of as a set of n m linear constraints, why the optimization problem is always feasible? Although the authors devoted half a page to explain on this procedure, I feel it is not well explained. Most of the discussion is not devoted to explaining the policy certification procedure. Why for a fixed \mu the best response is not markovian?

near-optimal reinforcement learning, neurips paper, policy certification procedure, (3 more...)

Neural Information Processing Systems

Jan-22-2025, 01:41:51 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)