Evaluating Vision-Language Models in the Wild with Human Preferences

Open in new window