PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning

Kim, Yongil, Hwang, Yerin, Yun, Hyeongu, Yoon, Seunghyun, Bui, Trung, Jung, Kyomin

Mar-15-2023–arXiv.org Artificial Intelligence

Vulnerability to lexical perturbation is a critical weakness of automatic evaluation metrics for image captioning. This paper proposes Perturbation Robust Multi-Lingual CLIPScore(PR-MCS), which exhibits robustness to such perturbations, as a novel reference-free image captioning metric applicable to multiple languages. To achieve perturbation robustness, we fine-tune the text encoder of CLIP with our language-agnostic method to distinguish the perturbed text from the original text. To verify the robustness of PR-MCS, we introduce a new fine-grained evaluation dataset consisting of detailed captions, critical objects, and the relationships between the objects for 3, 000 images in five languages. In our experiments, PR-MCS significantly outperforms baseline metrics in capturing lexical noise of all various perturbation types in all five languages, proving that PR-MCS is highly robust to lexical perturbations.

caption, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Mar-15-2023

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > Israel (0.04)
  - South Korea > Seoul
    - Seoul (0.04)

Genre:
- Research Report > New Finding (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning (1.00)
  - Natural Language > Machine Translation (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found