In-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding

Fan, Wan-Cyuan, Chen, Yen-Chun, Liu, Mengchen, Jacobson, Alexander, Yuan, Lu, Sigal, Leonid

Jul-22-2025–arXiv.org Artificial Intelligence

Recent methods for customizing Large Vision Language Models (LVLMs) for domain-specific tasks have shown promising results in scientific chart comprehension. However, existing approaches face two major limitations: First, they rely on paired data from only a few chart types, limiting generalization to wide range of chart types. Secondly, they lack targeted pre-training for chart-data alignment, which hampers the model's understanding of underlying data. In this paper, we introduce ChartScope, an LVLM optimized for in-depth chart comprehension across diverse chart types. We propose an efficient data generation pipeline that synthesizes paired data for a wide range of chart types, along with a novel Dual-Path training strategy that enabling the model to succinctly capture essential data details while preserving robust reasoning capabilities by incorporating reasoning over the underlying data. Lastly, we establish ChartDQA, a new benchmark for evaluating not only question-answering at different levels but also underlying data understanding. Experimental results demonstrate that ChartScope significantly enhances comprehension on a wide range of chart types. The code and data are available at https://davidhalladay.github.io/chartscope_demo.

large language model, machine learning, question answering, (18 more...)

arXiv.org Artificial Intelligence

Jul-22-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language
    - Large Language Model (0.51)
    - Question Answering (0.36)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found