Grounding Video Models to Actions through Goal Conditioned Exploration