RONA: Pragmatically Diverse Image Captioning with Coherence Relations

Ramakrishnan, Aashish Anantha, Ramakrishnan, Aadarsh Anantha, Lee, Dongwon

Mar-13-2025–arXiv.org Artificial Intelligence

Writing Assistants (e.g., Grammarly, Microsoft Copilot) traditionally generate diverse image captions by employing syntactic and semantic variations to describe image components. However, human-written captions prioritize conveying a central message alongside visual descriptions using pragmatic cues. To enhance pragmatic diversity, it is essential to explore alternative ways of communicating these messages in conjunction with visual content. To address this challenge, we propose RONA, a novel prompting strategy for Multi-modal Large Language Models (MLLM) that leverages Coherence Relations as an axis for variation. We demonstrate that RONA generates captions with better overall diversity and ground-truth alignment, compared to MLLM baselines across multiple domains. Our code is available at: https://github.com/aashish2000/RONA

caption, computational linguistic, relation, (14 more...)

arXiv.org Artificial Intelligence

Mar-13-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - Pennsylvania (0.04)
    - New York > New York County
      - New York City (0.04)
    - California > San Mateo County
      - Menlo Park (0.04)
- Europe
  - Monaco (0.04)
  - United Kingdom > England
    - Greater London > London (0.04)
  - Germany > North Rhine-Westphalia
    - Upper Bavaria > Munich (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found