Are We on the Right Way for Evaluating Large Vision-Language Models?

Open in new window