Reinforcement Learning for Hanabi