Towards Principled Representation Learning from Videos for Reinforcement Learning