From Images to Connections: Can DQN with GNNs learn the Strategic Game of Hex?

Keller, Yannik, Blüml, Jannis, Sudhakaran, Gopika, Kersting, Kristian

arXiv.org Artificial Intelligence 

The gameplay of strategic board games such as chess, Go and Hex is often characterized by combinatorial, relational structures--capturing distinct interactions and non-local patterns--and not just images. A key feature of CNNs is their relational inductive bias towards locality and translational invariance. Hence, we investigate the crucial question: Can GNNs, with their ability to encode complex connections, replace CNNs in self-play reinforcement learning? To this end, we do a comparison with Hex--an abstract yet strategically rich board game--serving as our experimental platform. Our findings reveal that GNNs excel at dealing with long range dependency situations in game states and are less prone to overfitting, but also showing a reduced proficiency in discerning local patterns. This suggests a potential paradigm shift, signaling the use of game-specific structures to reshape self-play reinforcement learning. In 2016, AlphaGo (Silver et al., 2016) became the first AI to beat professional Go players by combining Monte-Carlo tree search (MCTS) with neural network policy and value approximation and selfplay training. Its approach has since been transferred to various other board games: AlphaZero (Silver et al., 2017) achieved superhuman strength in Chess and Shogi while Mohex-3HNN (Gao et al., 2018) has become one of the strongest AI for Hex. Behind these accomplishments lies a crucial observation: Despite the diversity of the board games, all these programs use convolutional neural networks (CNN) as the foundational architecture to approximate policy and value functions. Figure 1: Non-local dependencies exist in many prominent board games. If A is blue, playing at B is a wasted move. These biases align harmoniously for most positions in the game of Go where spatially local patterns dominate.