Contrastive Reinforcement Learning of Symbolic Reasoning Domains