Video Prediction Models as Rewards for Reinforcement Learning

Neural Information Processing Systems 

Specifying reward signals that allow agents to learn complex behaviors is a long-standing challenge in reinforcement learning.A promising approach is to extract preferences for behaviors from unlabeled videos, which are widely available on the internet.