Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation

Open in new window