Reviews: Unsupervised Curricula for Visual Meta-Reinforcement Learning
–Neural Information Processing Systems
This paper presents a method for learning a distribution of tasks to feed to an agent that's learning via meta RL, while simultaneously optimizing the agent to perform better more quickly on tasks sampled from this distribution. The task distribution is trained using an objective that maximizes mutual information between a latent task variable and the trajectories produced by the meta RL agent. The meta RL agent is trained to maximize this mutual information, more or less. The overall optimization relies on some variational lower bounds on mutual information, and on the RL 2 algorithm for meta RL. Experiments are provided which show that the task distributions and meta RL agents trained in this co-adaptive manner exhibit some potentially useful behaviors, e.g. an improved ability to quickly solve new tasks sampled from an "actual" task distribution -- i.e., a task distribution which is not equal to the one that's co-adapted with the agent.
Neural Information Processing Systems
Jan-27-2025, 08:39:40 GMT
- Genre:
- Technology: