Reinforcement Learning as Iterative and Amortised Inference