Review for NeurIPS paper: Weakly-Supervised Reinforcement Learning for Controllable Behavior

Neural Information Processing Systems 

Summary and Contributions: This paper proposes a framework for goal-conditioned RL with a goal representation whose structure is learned from weak human supervision. Most goal-conditioned RL methods either use the raw image as a goal, or an encoding learned with an unsupervised method such as a VAE. This paper takes as input a (relatively small) dataset of images, and asks human annotators to rank semantic attributes for pairs of image (which has higher lighting, which one has a door which is more open, etc). The algorithm operates in two phases: 1. Using the weak supervision signal from the human annotators, a disentangled representation is learning using a GAN-type loss on triplets of 2 images and one binary label.