AlphaZeroES: Direct score maximization outperforms planning loss minimization

Open in new window