Goto

Collaborating Authors

 Reinforcement Learning



e92381dba235a8309f08ce46376189a9-Paper-Conference.pdf

Neural Information Processing Systems

The transition dynamics simply mixes an action and a random sampled latent. It then applies an exponential moving average for temporal persistency, the resulting latent is decoded to image using pretrained generator.








ContrastiveLearningasGoal-Conditioned ReinforcementLearning

Neural Information Processing Systems

We usethisideatoreinterpret aprior RLmethod asperforming contrastivelearning, and then use the idea to propose a much simpler method that achieves similar performance. Across arange ofgoal-conditioned RLtasks, wedemonstrate that contrastive RL methods achieve higher success rates than prior non-contrastive methods, including in the offline RL setting.