This post is about implementing a – quite basic – Neural Network that is able to play the game Tic-Tac-Toe. For sure there is not really a need for any Neural Network or Machine Learning model to implement a good – well, basically perfect – computer player for this game. This could be easily achieved by using a brute-force approach. But as this is the author's first excursion into the world of Machine Learning, opting for something simple seems to be a good idea. The motivation to start working on this post and the related project can be comprised in one word: AlphaGo. The game of Go is definitely the queen of competitive games.
My dad taught me when I was young, but I guess he was one of those dads who always let their kid win. To compensate for this lack of skill in one of the world's most popular games, I did what any data science lover would do: build an AI to beat the people I couldn't beat. But I wanted to see how a chess engine would do without reinforcement learning as well as learn how to deploy a deep learning model to the web. FICS has a database of 300 million games, individual moves made, the results, and the rating of the players involved. I downloaded all the games in 2012 where at least one player was above 2000 ELO.
An important feature of many problem domains in machine learning is their geometry. For example, adjacency relationships, symmetries, and Cartesian coordinates are essential to any complete description of board games, visual recognition, or vehicle control. Yet many approaches to learning ignore such information in their representations, instead inputting flat parameter vectors with no indication of how those parameters are situated geometrically. This paper argues that such geometric information is critical to the ability of any machine learning approach to effectively generalize; even a small shift in the configuration of the task in space from what was experienced in training can go wholly unrecognized unless the algorithm is able to learn the regularities in decision-making across the problem geometry. To demonstrate the importance of learning from geometry, three variants of the same evolutionary learning algorithm (NeuroEvolution of Augmenting Topologies), whose representations vary in their capacity to encode geometry, are compared in checkers. The result is that the variant that can learn geometric regularities produces a significantly more general solution. The conclusion is that it is important to enable machine learning to detect and thereby learn from the geometry of its problems.
The longstanding challenge in artificial intelligence of playing Go at professional human level has been succesfully tackled in recent works [5, 7, 6], where software tools (AlphaGo, AlphaGo Zero, AlphaZero) combining neural networks and Monte Carlo tree search reached superhuman level. A recent development was Leela Zero , an open source software whose neural network is trained over millions of games played in a distributed fashion, thus allowing improvements within reach of the resources of the academic community. However, all these programs suffer from a relevant limitation: it is impossible to target their victory margin. They are trained with a fixed komi of 7.5 and they are built to maximize just the winning probability, not considering the score difference. This has several negative consequences for these programs: when they are ahead, they choose suboptimal moves, and often win by a small margin; they cannot be used with komi 6.5, which is also common in professional games; they show bad play in handicap games, since the winning probability is not a relevant attribute in that situations. In principle all these problems could be overcome by replacing the binary reward (win 1, lose 0) with the game score difference, but the latter is known to be less robust [3, 8] and in general strongest programs use the former since the seminal works [1, 3, 2].