Verifying Cross-modal Entity Consistency in News using Vision-language Models

Open in new window