Unsupervised skill discovery with contrastive intrinsic control
Unsupervised Reinforcement Learning (RL), where RL agents pre-train with self-supervised rewards, is an emerging paradigm for developing RL agents that are capable of generalization. Recently, we released the Unsupervised RL Benchmark (URLB) which we covered in a previous post. A surprising finding was that competence-based algorithms significantly underperformed other categories. In this post we will demystify what has been holding back competence-based methods and introduce Contrastive Intrinsic Control (CIC), a new competence-based algorithm that is the first to achieve leading results on URLB. To recap, competence-based methods (which we will cover in detail) maximize the mutual information between states and skills (e.g.
Apr-1-2022, 14:00:00 GMT
- Technology: