Review for NeurIPS paper: Learning to Play No-Press Diplomacy with Best Response Policy Iteration
–Neural Information Processing Systems
Weaknesses: I'm concerned that the comparison to DipNet, the prior state of the art, is misleading because the authors initialize their algorithm by effectively computing a best response to DipNet. Since they beat DipNet, the authors say that they are "stronger" than DipNet. However, beating DipNet is expected if one were to compute a best response to DipNet, even if the best response is a "weaker" policy. To illustrate why this is a problem, one could imagine a situation like Rock-Paper-Scissors where DipNet is biased toward playing Rock, so the techniques introduced in this paper effectively learn to always choose Paper. Paper beats Rock, but one is not "stronger" than the other.
Neural Information Processing Systems
Feb-6-2025, 13:51:42 GMT
- Technology: