Goto

Collaborating Authors

 Nam, Taewook


LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers

arXiv.org Artificial Intelligence

We propose a framework that leverages foundation models as teachers, guiding a reinforcement learning agent to acquire semantically meaningful behavior without human feedback. In our framework, the agent receives task instructions grounded in a training environment from large language models. Then, a vision-language model guides the agent in learning the multi-task language-conditioned policy by providing reward feedback. We demonstrate that our method can learn semantically meaningful skills in a challenging open-ended MineDojo environment while prior unsupervised skill discovery methods struggle. Additionally, we discuss observed challenges of using off-the-shelf foundation models as teachers and our efforts to address them.


Meta Dropout: Learning to Perturb Features for Generalization

arXiv.org Machine Learning

A machine learning model that generalizes well should obtain low errors on the unseen test examples. Test examples could be understood as perturbations of training examples, which means that if we know how to optimally perturb training examples to simulate test examples, we could achieve better generalization at test time. However, obtaining such perturbation is not possible in standard machine learning frameworks as the distribution of the test data is unknown. To tackle this challenge, we propose a meta-learning framework that learns to perturb the latent features of training examples for generalization. Specifically, we meta-learn a noise generator that will output the optimal noise distribution for latent features across all network layers to obtain low error on the test instances, in an input-dependent manner. Then, the learned noise generator will perturb the training examples of unseen tasks at the meta-test time. We show that our method, Meta-dropout, could be also understood as meta-learning of the variational inference framework for a specific graphical model, and describe its connection to existing regularizers. Finally, we validate Meta-dropout on multiple benchmark datasets for few-shot classification, whose results show that it not only significantly improves the generalization performance of meta-learners but also allows them to obtain fast converegence.