Reinforcement learning for Quantum Tiq-Taq-Toe