[R] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm • r/MachineLearning
One thing I was curious about is whether AlphaZero can play endgames. For example, a friend brought up whether AlphaZero could learn how to play Nim. For anybody who isn't familiar: https://en.wikipedia.org/wiki/Nim, the optimal strategy for Nim involves computing the xor of all the heap sizes. I thought no, largely due to the lack of gradient information/lack of structure/MCTS not being a good heuristic for the quality of the move. However, this game of Nim doesn't seem that different from say, a knight-bishop end game mating scenario for chess.
Dec-6-2017, 10:00:38 GMT