Policy Learning from Large Vision-Language Model Feedback without Reward Modeling

Open in new window