A Meta-Reinforcement Learning Algorithm for Causal Discovery