PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning
Kim, Yongil, Hwang, Yerin, Yun, Hyeongu, Yoon, Seunghyun, Bui, Trung, Jung, Kyomin
–arXiv.org Artificial Intelligence
Vulnerability to lexical perturbation is a critical weakness of automatic evaluation metrics for image captioning. This paper proposes Perturbation Robust Multi-Lingual CLIPScore(PR-MCS), which exhibits robustness to such perturbations, as a novel reference-free image captioning metric applicable to multiple languages. To achieve perturbation robustness, we fine-tune the text encoder of CLIP with our language-agnostic method to distinguish the perturbed text from the original text. To verify the robustness of PR-MCS, we introduce a new fine-grained evaluation dataset consisting of detailed captions, critical objects, and the relationships between the objects for 3, 000 images in five languages. In our experiments, PR-MCS significantly outperforms baseline metrics in capturing lexical noise of all various perturbation types in all five languages, proving that PR-MCS is highly robust to lexical perturbations.
arXiv.org Artificial Intelligence
Mar-15-2023
- Country:
- Asia
- Middle East > Israel (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.88)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Machine Learning (1.00)
- Natural Language > Machine Translation (0.94)
- Information Technology > Artificial Intelligence