AITopics | chart image

Collaborating Authors

chart image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

START: Spatial and Textual Learning for Chart Understanding

Liu, Zhuoming, Gao, Xiaofeng, Niu, Feiyang, Gao, Qiaozi, Liu, Liu, Piramuthu, Robinson

arXiv.org Artificial IntelligenceDec-9-2025

Chart understanding is crucial for deploying multimodal large language models (MLLMs) in real-world scenarios such as analyzing scientific papers and technical reports. Unlike natural images, charts pair a structured visual layout (spatial property) with an underlying data representation (textual property) -- grasping both is essential for precise, fine-grained chart reasoning. Motivated by this observation, we propose START, the Spatial and T extual learning for chART understanding. Specifically, we introduce (i) chart-element grounding and (ii) chart-to-code generation to strengthen an MLLM's understanding of both chart visual layout and data details. T o facilitate spatial and textual learning, we propose the START-Dataset generated with a novel data-generation pipeline that first leverages an MLLM to translate real chart images into executable chart code, recovering the underlying data representation while preserving the visual distribution of real-world charts. W e then evolve the code with a Large Language Model (LLM) to ascertain the positions of chart elements that capture the chart's visual structure, addressing challenges that existing methods cannot handle. T o evaluate a model's ability to understand chart spatial structures, we propose the Chart Spatial understanding Benchmark (CS-Bench), filling a critical gap in comprehensive chart understanding evaluation. Leveraging spatial and textual learning, START delivers consistent gains across model sizes and benchmarks over the base models and surpasses prior state-of-the-art by a clear margin. Code, data and models will be publicly available.

chart image, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2512.07186

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Chart-RVR: Reinforcement Learning with Verifiable Rewards for Explainable Chart Reasoning

Sinha, Sanchit, Frunza, Oana, Rasul, Kashif, Nevmyvaka, Yuriy, Zhang, Aidong

arXiv.org Artificial IntelligenceOct-14-2025

The capabilities of Large Vision-Language Models (LVLMs) have reached state-of-the-art on many visual reasoning tasks, including chart reasoning, yet they still falter on out-of-distribution (OOD) data, and degrade further when asked to produce their chain-of-thought (CoT) rationales, limiting explainability. We present Chart-RVR, a general framework that fine-tunes LVLMs to be more robust and explainable for chart reasoning by coupling Group Relative Policy Optimization (GRPO) with automatically verifiable rewards. Our framework comprises of three rewards that maximize: (i) correct chart-type classification, (ii) faithful chart table reconstruction, and (iii) process conformity. Applied to 3-billion-parameter LVLMs, Chart-RVR consistently outperforms standard supervised fine-tuning (SFT) on both in-distribution and out-of-distribution datasets, closing the OOD performance gap while improving rationale fidelity. The resulting models, the Chart-RVR-3B series, achieve state-of-the-art results on six chart-reasoning benchmarks spanning in-domain and OOD settings, surpassing all existing models of comparable size. Beyond accuracy, Chart-RVR yields more interpretable CoT rationales, strengthening trust and reliability - showcasing the power of verifiable rewards with GRPO for training reliable, interpretable chart-reasoning models.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.10973

Country:

North America > United States (0.28)
Europe > Austria (0.28)
North America > Mexico (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)

Add feedback

Exploring Diffusion Models for Generative Forecasting of Financial Charts

Lee, Taegyeong, Park, Jiwon, Bang, Kyunga, Hwang, Seunghyun, Jang, Ung-Jin

arXiv.org Artificial IntelligenceSep-3-2025

Recent advances in generative models have enabled significant progress in tasks such as generating and editing images from text, as well as creating videos from text prompts, and these methods are being applied across various fields. However, in the financial domain, there may still be a reliance on time-series data and a continued focus on transformer models, rather than on diverse applications of generative models. In this paper, we propose a novel approach that leverages text-to-image model by treating time-series data as a single image pattern, thereby enabling the prediction of stock price trends. Unlike prior methods that focus on learning and classifying chart patterns using architectures such as ResNet or ViT, we experiment with generating the next chart image from the current chart image and an instruction prompt using diffusion models. Furthermore, we introduce a simple method for evaluating the generated chart image against ground truth image. We highlight the potential of leveraging text-to-image generative models in the financial domain, and our findings motivate further research to address the current limitations and expand their applicability.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.02308

Genre: Research Report > New Finding (0.89)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning

Masry, Ahmed, Puri, Abhay, Hashemi, Masoud, Rodriguez, Juan A., Thakkar, Megh, Mahajan, Khyati, Yadav, Vikas, Madhusudhan, Sathwik Tejaswi, Piché, Alexandre, Bahdanau, Dzmitry, Pal, Christopher, Vazquez, David, Hoque, Enamul, Taslakian, Perouz, Rajeswar, Sai, Gella, Spandana

arXiv.org Artificial IntelligenceAug-14-2025

Charts are essential to data analysis, transforming raw data into clear visual representations that support human decision-making. Although current vision-language models (VLMs) have made significant progress, they continue to struggle with chart comprehension due to training on datasets that lack diversity and real-world authenticity, or on automatically extracted underlying data tables of charts, which can contain numerous estimation errors. Furthermore, existing models only rely on supervised fine-tuning using these low-quality datasets, severely limiting their effectiveness. To address these issues, we first propose BigCharts, a dataset creation pipeline that generates visually diverse chart images by conditioning the rendering process on real-world charts sourced from multiple online platforms. Unlike purely synthetic datasets, BigCharts incorporates real-world data, ensuring authenticity and visual diversity, while still retaining accurate underlying data due to our proposed replotting process. Additionally, we introduce a comprehensive training framework that integrates supervised fine-tuning with Group Relative Policy Optimization (GRPO)-based reinforcement learning. By introducing novel reward signals specifically designed for chart reasoning, our approach enhances model robustness and generalization across diverse chart styles and domains, resulting in a state-of-the-art chart reasoning model, BigCharts-R1. Extensive experiments demonstrate that our models surpass existing methods on multiple chart question-answering benchmarks compared to even larger open-source and closed-source models.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2508.09804

Country:

Asia (0.68)
North America > Mexico (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Banking & Finance (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Visualization (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

From Charts to Fair Narratives: Uncovering and Mitigating Geo-Economic Biases in Chart-to-Text

Mahbub, Ridwan, Islam, Mohammed Saidul, Nayeem, Mir Tafseer, Laskar, Md Tahmid Rahman, Rahman, Mizanur, Joty, Shafiq, Hoque, Enamul

arXiv.org Artificial IntelligenceAug-14-2025

Charts are very common for exploring data and communicating insights, but extracting key takeaways from charts and articulating them in natural language can be challenging. The chart-to-text task aims to automate this process by generating textual summaries of charts. While with the rapid advancement of large Vision-Language Models (VLMs), we have witnessed great progress in this domain, little to no attention has been given to potential biases in their outputs. This paper investigates how VLMs can amplify geo-economic biases when generating chart summaries, potentially causing societal harm. Specifically, we conduct a large-scale evaluation of geo-economic biases in VLM-generated chart summaries across 6,000 chart-country pairs from six widely used proprietary and open-source models to understand how a country's economic status influences the sentiment of generated summaries. Our analysis reveals that existing VLMs tend to produce more positive descriptions for high-income countries compared to middle- or low-income countries, even when country attribution is the only variable changed. We also find that models such as GPT-4o-mini, Gemini-1.5-Flash, and Phi-3.5 exhibit varying degrees of bias. We further explore inference-time prompt-based debiasing techniques using positive distractors but find them only partially effective, underscoring the complexity of the issue and the need for more robust debiasing strategies. Our code and dataset are publicly available here.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.0945

Country:

Oceania (1.00)
Europe (1.00)
Asia (1.00)
(2 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Banking & Finance (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Effective Training Data Synthesis for Improving MLLM Chart Understanding

Yang, Yuwei, Zhang, Zeyu, Hou, Yunzhong, Li, Zhuowan, Liu, Gaowen, Payani, Ali, Ting, Yuan-Sen, Zheng, Liang

arXiv.org Artificial IntelligenceAug-11-2025

Being able to effectively read scientific plots, or chart understanding, is a central part toward building effective agents for science. However, existing multimodal large language models (MLLMs), especially open-source ones, are still falling behind with a typical success rate of 30%-50% on challenging benchmarks. Previous studies on fine-tuning MLLMs with synthetic charts are often restricted by their inadequate similarity to the real charts, which could compromise model training and performance on complex real-world charts. In this study, we show that modularizing chart generation and diversifying visual details improves chart understanding capabilities. In particular, we design a five-step data synthesis pipeline, where we separate data and function creation for single plot generation, condition the generation of later subplots on earlier ones for multi-subplot figures, visually diversify the generated figures, filter out low quality data, and finally generate the question-answer (QA) pairs with GPT-4o. This approach allows us to streamline the generation of fine-tuning datasets and introduce the effective chart dataset (ECD), which contains 10k+ chart images and 300k+ QA pairs, covering 25 topics and featuring 250+ chart type combinations with high visual complexity. We show that ECD consistently improves the performance of various MLLMs on a range of real-world and synthetic test sets. Code, data and models are available at: https://github.com/yuweiyang-anu/ECD.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.06492

Genre: Research Report > New Finding (0.66)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

ChartCap: Mitigating Hallucination of Dense Chart Captioning

Lim, Junyoung, Ahn, Jaewoo, Kim, Gunhee

arXiv.org Artificial IntelligenceAug-6-2025

Generating accurate, informative, and hallucination-free captions for charts remains challenging for vision language models, primarily due to the lack of large-scale, high-quality datasets of real-world charts. However, existing real-world chart datasets suffer from the inclusion of extraneous information that cannot be inferred from the chart and failure to sufficiently capture structural elements and key insights. Therefore, we introduce ChartCap, a large-scale dataset of 565K real-world chart images paired with type-specific, dense captions that exclude extraneous information and highlight both structural elements and key insights in detail. T o build ChartCap, we design a four-stage pipeline that generates captions using only the discernible data from the chart and employ a cycle consistency-based human verification, which accelerates quality control without sacrificing accuracy. Additionally, we propose a novel metric, the Visual Consistency Score, which evaluates caption quality by measuring the similarity between the chart regenerated from a caption and the original chart, independent of reference captions. Extensive experiments confirms that models fine-tuned on ChartCap consistently generate more accurate and informative captions with reduced hallucinations, surpassing both open-source and proprietary models and even human-annotated captions.

caption, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2508.03164

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Visualization (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs' Capability via Chart Editing

Zhao, Xuanle, Liu, Xuexin, Yang, Haoyue, Luo, Xianzhen, Zeng, Fanhu, Li, Jianling, Shi, Qi, Chen, Chi

arXiv.org Artificial IntelligenceAug-5-2025

Although multimodal large language models (MLLMs) show promise in generating chart rendering code, editing charts via code presents a greater challenge. This task demands MLLMs to integrate chart understanding and reasoning capacities, which are labor-intensive. While many MLLMs claim such editing capabilities, current evaluations rely on limited case studies, highlighting the urgent need for a comprehensive evaluation framework. In this work, we propose \textsc{ChartEdit}, a novel benchmark designed for chart editing tasks, featuring $1405$ diverse editing instructions applied to $233$ real-world charts, each manually annotated and validated for accuracy. Utilizing \textsc{ChartEdit}, we evaluate the performance of 10 mainstream MLLMs across two types of experiments at both the code and chart levels. The results suggest that large-scale models can generate code to produce images that partially match the reference images. However, their ability to generate accurate edits according to the instructions remains limited. The state-of-the-art (SOTA) model achieves a score of only $59.96$, highlighting significant challenges in precise modification. In contrast, small-scale models, including chart-domain models, struggle both with following editing instructions and generating overall chart images, underscoring the need for further development in this area. Code is available at https://github.com/xxlllz/ChartEdit.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.11935

Country: Asia > China (0.47)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation

Kondic, Jovana, Li, Pengyuan, Joshi, Dhiraj, He, Zexue, Abedin, Shafiq, Sun, Jennifer, Wiesel, Ben, Schwartz, Eli, Nassar, Ahmed, Wu, Bo, Arbelle, Assaf, Oliva, Aude, Gutfreund, Dan, Karlinsky, Leonid, Feris, Rogerio

arXiv.org Artificial IntelligenceJul-29-2025

Chart-to-code reconstruction -- the task of recovering executable plotting scripts from chart images -- provides important insights into a model's ability to ground data visualizations in precise, machine-readable form. Yet many existing multimodal benchmarks largely focus primarily on answering questions about charts or summarizing them. To bridge this gap, we present ChartGen, a fully-automated pipeline for code-guided synthetic chart generation. Starting from seed chart images, ChartGen (i) prompts a vision-language model (VLM) to reconstruct each image into a python script, and (ii) iteratively augments that script with a code-oriented large language model (LLM). Using ChartGen, we create 222.5K unique chart-image code pairs from 13K seed chart images, and present an open-source synthetic chart dataset covering 27 chart types, 11 plotting libraries, and multiple data modalities (image, code, text, CSV, DocTags). From this corpus, we curate a held-out chart-to-code evaluation subset of 4.3K chart image-code pairs, and evaluate six open-weight VLMs (3B - 26B parameters), highlighting substantial room for progress. We release the pipeline, prompts, and the dataset to help accelerate efforts towards robust chart understanding and vision-conditioned code generation: https://github.com/SD122025/ChartGen/

chart image, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2507.19492

Genre: Research Report (0.54)

Technology:

Information Technology > Visualization (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

In-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding

Fan, Wan-Cyuan, Chen, Yen-Chun, Liu, Mengchen, Jacobson, Alexander, Yuan, Lu, Sigal, Leonid

arXiv.org Artificial IntelligenceJul-22-2025

Recent methods for customizing Large Vision Language Models (LVLMs) for domain-specific tasks have shown promising results in scientific chart comprehension. However, existing approaches face two major limitations: First, they rely on paired data from only a few chart types, limiting generalization to wide range of chart types. Secondly, they lack targeted pre-training for chart-data alignment, which hampers the model's understanding of underlying data. In this paper, we introduce ChartScope, an LVLM optimized for in-depth chart comprehension across diverse chart types. We propose an efficient data generation pipeline that synthesizes paired data for a wide range of chart types, along with a novel Dual-Path training strategy that enabling the model to succinctly capture essential data details while preserving robust reasoning capabilities by incorporating reasoning over the underlying data. Lastly, we establish ChartDQA, a new benchmark for evaluating not only question-answering at different levels but also underlying data understanding. Experimental results demonstrate that ChartScope significantly enhances comprehension on a wide range of chart types. The code and data are available at https://davidhalladay.github.io/chartscope_demo.

large language model, machine learning, question answering, (18 more...)

arXiv.org Artificial Intelligence

2507.14298

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.36)

Add feedback