An Optimisation Framework for Unsupervised Environment Design
Monette, Nathan, Letcher, Alistair, Beukman, Michael, Jackson, Matthew T., Rutherford, Alexander, Goldie, Alexander D., Foerster, Jakob N.
–arXiv.org Artificial Intelligence
For reinforcement learning agents to be deployed in high-risk settings, they must achieve a high level of robustness to unfamiliar scenarios. One approach for improving robustness is unsupervised environment design (UED), a suite of methods that aim to maximise an agent's generalisability by training it on a wide variety of environment configurations. In this work, we study UED from an optimisation perspective, providing stronger theoretical guarantees for practical settings than prior work. Whereas previous methods relied on guarantees if they reach convergence, our framework employs a nonconvex-strongly-concave objective for which we provide a provably convergent algorithm in the zero-sum setting. We empirically verify the efficacy of our method, outperforming prior methods on two of three environments with varying difficulties.
arXiv.org Artificial Intelligence
Jul-10-2025