Reviews: Playing hard exploration games by watching YouTube

Neural Information Processing Systems 

While usually supervision has to be intentionally provided by a human, the authors instead use YouTube videos as a form of supervision. They first align videos to a shared representation, using two concept that do not require any labeling: i) predicting the temporal distance between frames in the same video, and ii) predicting the temporal distance between a video and audio frame of the same video. Subsequently, they use the embedding on a novel video to generate checkpoints, which serve a intermediate rewards for an RL agent. Experiments show state-of-the-art performance amongst learning from demonstration approaches. Strength: - The paper is clearly written, with logical steps between the sections, and good motivations.