How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning

Open in new window