MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation
Fu, Jinlan, Huangfu, Shenzhen, Fei, Hao, Huang, Yichong, Shen, Xiaoyu, Qiu, Xipeng, Ng, See-Kiong
–arXiv.org Artificial Intelligence
The alt-text generation task produces concise, context-relevant descriptions of images, enabling blind and low-vision users to access online images. Despite the capabilities of large vision-language models, alt-text generation performance remains limited due to noisy user annotations, inconsistent standards, and MLLMs' insensitivity to contextual information. Previous efforts to fine-tune MLLMs using supervised fine-tuning (SFT) have struggled, as SFT relies on accurate target annotations, which are often flawed in user-generated alt-text. To address this, we propose Multi-faceted Cross-modal Direct Preference Optimization (MCM-DPO), which improves alt-text generation by learning to identify better options in preference pairs without requiring precise annotations. MCM-DPO optimizes preferences across single, paired, and multi-preference dimensions, covering textual, visual, and cross-modal factors. In light of the scarcity of high-quality annotated and preference-labeled datasets for alt-text, we constructed two large-scale, high-quality datasets named TAlt and PAlt, sourced from Twitter and Pinterest. These datasets include 202k annotated alt-text samples and 18k preference pairs that cover diverse preference dimensions, aiming to support further research in this domain. Experimental results show that our proposed MCM-DPO method consistently outperforms both DPO and SFT, establishing a new state of the art in alt-text generation. We release the code and data here: https://github.com/LVUGAI/MCM-DPO
arXiv.org Artificial Intelligence
Oct-2-2025
- Country:
- Asia
- China
- Guangdong Province > Shenzhen (0.04)
- Heilongjiang Province > Harbin (0.04)
- Shanghai > Shanghai (0.04)
- Zhejiang Province > Ningbo (0.04)
- Singapore > Central Region
- Singapore (0.05)
- China
- Europe
- Austria > Vienna (0.14)
- France > Hauts-de-France
- Ireland > Leinster
- County Dublin > Dublin (0.05)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- North America > United States
- Florida > Miami-Dade County
- Miami (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New York > New York County
- New York City (0.04)
- Florida > Miami-Dade County
- Oceania > Palau (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Information Technology > Services (0.68)
- Technology: