CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

Azad, Abdus Salam, Gur, Izzeddin, Emhoff, Jasper, Alexis, Nathaniel, Faust, Aleksandra, Abbeel, Pieter, Stoica, Ion

arXiv.org Artificial Intelligence 

Deep Reinforcement Learning (RL) has shown exciting progress in the past decade in many challenging domains including Reinforcement Learning (RL) algorithms are often Atari (Mnih et al., 2015), Dota (Berner et al., 2019), known for sample inefficiency and difficult Go (Silver et al., 2016). However, deep RL is also known generalization. Recently, Unsupervised Environment for its sample inefficiency and difficult generalization-- Design (UED) emerged as a new paradigm performing poorly on unseen tasks or failing altogether for zero-shot generalization by simultaneously with the slightest change (Cobbe et al., 2019; Azad et al., learning a task distribution and agent policies 2022; Zhang et al., 2018). While, Curriculum Learning on the generated tasks. This is a non-stationary (CL) algorithms have shown to improve RL sample efficiency process where the task distribution evolves along by adapting the training task distribution, i.e., the with agent policies; creating an instability over curriculum (Portelas et al., 2020; Narvekar et al., 2020), time. While past works demonstrated the potential recently a class of Unsupervised CL algorithms, called Unsupervised of such approaches, sampling effectively from Environment Design (UED) (Dennis et al., 2020; the task space remains an open challenge, bottlenecking Jiang et al., 2021a) has shown promising zero-shot generalization these approaches. To this end, we introduce by automatically generating the training tasks and CLUTR: a novel unsupervised curriculum adapting the curriculum simultaneously.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found