Reinforcement Learning Tic Tac Toe Python Implementation