Adapting Vision-Language Models for Evaluating World Models

Open in new window