Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models

Open in new window