Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models