Vision Language Models are In-Context Value Learners

Open in new window