AlphaZeroES: Direct score maximization outperforms planning loss minimization