Goto

Collaborating Authors

 bar chart



Format Matters: The Robustness of Multimodal LLMs in Reviewing Evidence from Tables and Charts

Ho, Xanh, Wu, Yun-Ang, Kumar, Sunisth, Boudin, Florian, Takasu, Atsuhiro, Aizawa, Akiko

arXiv.org Artificial Intelligence

With the growing number of submitted scientific papers, there is an increasing demand for systems that can assist reviewers in evaluating research claims. Experimental results are a core component of scientific work, often presented in varying formats such as tables or charts. Understanding how robust current multimodal large language models (multimodal LLMs) are at verifying scientific claims across different evidence formats remains an important and underexplored challenge. In this paper, we design and conduct a series of experiments to assess the ability of multimodal LLMs to verify scientific claims using both tables and charts as evidence. To enable this evaluation, we adapt two existing datasets of scientific papers by incorporating annotations and structures necessary for a multimodal claim verification task. Using this adapted dataset, we evaluate 12 multimodal LLMs and find that current models perform better with table-based evidence while struggling with chart-based evidence. We further conduct human evaluations and observe that humans maintain strong performance across both formats, unlike the models. Our analysis also reveals that smaller multimodal LLMs (under 8B) show weak correlation in performance between table-based and chart-based tasks, indicating limited cross-modal generalization. These findings highlight a critical gap in current models' multimodal reasoning capabilities. We suggest that future multimodal LLMs should place greater emphasis on improving chart understanding to better support scientific claim verification.


InfoDet: A Dataset for Infographic Element Detection

Zhu, Jiangning, Zhou, Yuxing, Wang, Zheng, Yao, Juntao, Gu, Yima, Yuan, Yuhui, Liu, Shixia

arXiv.org Artificial Intelligence

Given the central role of charts in scientific, business, and communication contexts, enhancing the chart understanding capabilities of vision-language models (VLMs) has become increasingly critical. A key limitation of existing VLMs lies in their inaccurate visual grounding of infographic elements, including charts and human-recognizable objects (HROs) such as icons and images. However, chart understanding often requires identifying relevant elements and reasoning over them. To address this limitation, we introduce InfoDet, a dataset designed to support the development of accurate object detection models for charts and HROs in infographics. It contains 11,264 real and 90,000 synthetic infographics, with over 14 million bounding box annotations. These annotations are created by combining the model-in-the-loop and programmatic methods. We demonstrate the usefulness of InfoDet through three applications: 1) constructing a Thinking-with-Boxes scheme to boost the chart understanding performance of VLMs, 2) comparing existing object detection models, and 3) applying the developed detection model to document layout and UI element detection.


A Appendix

Neural Information Processing Systems

We first give a derivation on the equivalence of label smoothing regularization and Eq. 7. Evidently, the objective does not regularize confidence diversity. "Scale both" corresponds to the originally proposed distillation objective in which both teacher and Plots of test accuracy and ECE against amount of temperature scaling applied are shown in Figure 1. Firstly, we observe that models trained with student scaling have ECE almost identical to that of the teacher models. As a direct contrast, we see that the student models trained without student scaling perform much better in terms of calibration error in general over its teacher. This coupled effect could be the reason for the observed conflict between ECE and accuracy.


BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning

Masry, Ahmed, Puri, Abhay, Hashemi, Masoud, Rodriguez, Juan A., Thakkar, Megh, Mahajan, Khyati, Yadav, Vikas, Madhusudhan, Sathwik Tejaswi, Piché, Alexandre, Bahdanau, Dzmitry, Pal, Christopher, Vazquez, David, Hoque, Enamul, Taslakian, Perouz, Rajeswar, Sai, Gella, Spandana

arXiv.org Artificial Intelligence

Charts are essential to data analysis, transforming raw data into clear visual representations that support human decision-making. Although current vision-language models (VLMs) have made significant progress, they continue to struggle with chart comprehension due to training on datasets that lack diversity and real-world authenticity, or on automatically extracted underlying data tables of charts, which can contain numerous estimation errors. Furthermore, existing models only rely on supervised fine-tuning using these low-quality datasets, severely limiting their effectiveness. To address these issues, we first propose BigCharts, a dataset creation pipeline that generates visually diverse chart images by conditioning the rendering process on real-world charts sourced from multiple online platforms. Unlike purely synthetic datasets, BigCharts incorporates real-world data, ensuring authenticity and visual diversity, while still retaining accurate underlying data due to our proposed replotting process. Additionally, we introduce a comprehensive training framework that integrates supervised fine-tuning with Group Relative Policy Optimization (GRPO)-based reinforcement learning. By introducing novel reward signals specifically designed for chart reasoning, our approach enhances model robustness and generalization across diverse chart styles and domains, resulting in a state-of-the-art chart reasoning model, BigCharts-R1. Extensive experiments demonstrate that our models surpass existing methods on multiple chart question-answering benchmarks compared to even larger open-source and closed-source models.


Can AI Explanations Make You Change Your Mind?

Spillner, Laura, Ringe, Rachel, Porzel, Robert, Malaka, Rainer

arXiv.org Artificial Intelligence

In the context of AI-based decision support systems, explanations can help users to judge when to trust the AI's suggestion, and when to question it. In this way, human oversight can prevent AI errors and biased decision-making. However, this rests on the assumption that users will consider explanations in enough detail to be able to catch such errors. We conducted an online study on trust in explainable DSS, and were surprised to find that in many cases, participants spent little time on the explanation and did not always consider it in detail. We present an exploratory analysis of this data, investigating what factors impact how carefully study participants consider AI explanations, and how this in turn impacts whether they are open to changing their mind based on what the AI suggests.


Evaluating LLMs for Visualization Generation and Understanding

Khan, Saadiq Rauf, Chandak, Vinit, Mukherjea, Sougata

arXiv.org Artificial Intelligence

With the amount and complexity of information produced increasing at staggering rates, information visualization is being utilized to enable people to understand and analyze information. Over the years, many techniques have been developed for creating information visualizations of different types of data. Information visualization can be created using various tools (for example, Tableau [1]), libraries in many programming languages (for example, matplotlib [2]), as well as scripts (for example, Vega-lite [3]). However, the complexity of these tools, libraries, and scripts can pose a barrier, especially for people without a strong background in data science or programming. To address this, automation of visualization creation using artificial intelligence techniques has also been explored [4]. Natural language interfaces allow users to generate visualizations using simple and intuitive commands. The integration of natural language processing into data visualization tools significantly improves the efficiency of data analysis. Analysts can now focus more on interpreting the data rather than the technicalities of creating visualizations.


Infogen: Generating Complex Statistical Infographics from Documents

Ghosh, Akash, Garimella, Aparna, Ramu, Pritika, Bandyopadhyay, Sambaran, Saha, Sriparna

arXiv.org Artificial Intelligence

Statistical infographics are powerful tools that simplify complex data into visually engaging and easy-to-understand formats. Despite advancements in AI, particularly with LLMs, existing efforts have been limited to generating simple charts, with no prior work addressing the creation of complex infographics from text-heavy documents that demand a deep understanding of the content. We address this gap by introducing the task of generating statistical infographics composed of multiple sub-charts (e.g., line, bar, pie) that are contextually accurate, insightful, and visually aligned. To achieve this, we define infographic metadata that includes its title and textual insights, along with sub-chart-specific details such as their corresponding data and alignment. We also present Infodat, the first benchmark dataset for text-to-infographic metadata generation, where each sample links a document to its metadata. We propose Infogen, a two-stage framework where fine-tuned LLMs first generate metadata, which is then converted into infographic code. Extensive evaluations on Infodat demonstrate that Infogen achieves state-of-the-art performance, outperforming both closed and open-source LLMs in text-to-statistical infographic generation.


Uncovering Bottlenecks and Optimizing Scientific Lab Workflows with Cycle Time Reduction Agents

Fehlis, Yao

arXiv.org Artificial Intelligence

Scientific laboratories, particularly those in pharmaceutical and biotechnology companies, encounter significant challenges in optimizing workflows due to the complexity and volume of tasks such as compound screening and assay execution. We introduce Cycle Time Reduction Agents (CTRA), a LangGraph-based agentic workflow designed to automate the analysis of lab operational metrics. CTRA comprises three main components: the Question Creation Agent for initiating analysis, Operational Metrics Agents for data extraction and validation, and Insights Agents for reporting and visualization, identifying bottlenecks in lab processes. This paper details CTRA's architecture, evaluates its performance on a lab dataset, and discusses its potential to accelerate pharmaceutical and biotechnological development. CTRA offers a scalable framework for reducing cycle times in scientific labs.


ChartLens: Fine-grained Visual Attribution in Charts

Suri, Manan, Mathur, Puneet, Lipka, Nedim, Dernoncourt, Franck, Rossi, Ryan A., Manocha, Dinesh

arXiv.org Artificial Intelligence

The growing capabilities of multimodal large language models (MLLMs) have advanced tasks like chart understanding. However, these models often suffer from hallucinations, where generated text sequences conflict with the provided visual data. To address this, we introduce Post-Hoc Visual Attribution for Charts, which identifies fine-grained chart elements that validate a given chart-associated response. We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects and employs set-of-marks prompting with MLLMs for fine-grained visual attribution. Additionally, we present ChartVA-Eval, a benchmark with synthetic and real-world charts from diverse domains like finance, policy, and economics, featuring fine-grained attribution annotations. Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.