Review for NeurIPS paper: Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Feb-6-2025, 13:51:42 GMT–Neural Information Processing Systems

Weaknesses: I'm concerned that the comparison to DipNet, the prior state of the art, is misleading because the authors initialize their algorithm by effectively computing a best response to DipNet. Since they beat DipNet, the authors say that they are "stronger" than DipNet. However, beating DipNet is expected if one were to compute a best response to DipNet, even if the best response is a "weaker" policy. To illustrate why this is a problem, one could imagine a situation like Rock-Paper-Scissors where DipNet is biased toward playing Rock, so the techniques introduced in this paper effectively learn to always choose Paper. Paper beats Rock, but one is not "stronger" than the other.

agent, best response, dipnet, (7 more...)

Neural Information Processing Systems

Feb-6-2025, 13:51:42 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.59)