SPO: Sequential Monte Carlo Policy Optimisation

Neural Information Processing Systems 

Leveraging planning during learning and decision-making is central to the long-term development of intelligent agents.