mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning

Wei, Jingxuan, Xu, Nan, Chang, Guiyong, Luo, Yin, Yu, BiHui, Guo, Ruifeng

Apr-1-2024–arXiv.org Artificial Intelligence

The goal of multimodal chart question answering is to automatically answer a natural language question about a chart to facilitate visual data analysis (Hoque et al., 2022), where the ability to understand and interact with visual data is essential (Masry et al., 2022). It has emerged as a crucial intersection of computer vision and natural language processing, addressing the growing demand for intelligent systems capable of interpreting complex visual data in charts (Masry et al., 2022). Beyond its general applications, multimodal chart question-answering plays a pivotal role in sectors requiring precise and rapid analysis of visual data. In the financial domain, it is indispensable for tasks such as financial report analysis (Wang et al., 2023a), decision support (Kafle et al., 2020), invoice parsing (Gerling and Lessmann, 2023), and contract review (Jie et al., 2023). Similarly, in the medical field, it significantly contributes to the digitization of patient records (Xu et al., 2021), medical insurance review (Meskó, 2023), diagnostic assistance (Othmani and Zeghina, 2022), and quality control (Schilcher et al., 2024) of medical records. Due to the richness and ambiguities of natural language and complex visual reasoning, multimodal chart question answering task requires to predict the answer in the intersection of information visualization, natural language processing, and human computer interactions (Hoque et al., 2022). Early approaches apply natural language processing techniques by largely depending on heuristics or grammarbased parsing techniques (Setlur et al., 2016; Srinivasan and Stasko, 2017; Hoque et al., 2017; Gao et al., 2015). Thanks to insufficient processing of complex linguistic phenomena, over-reliance on grammatical rules, and limited depth of understanding natural language, deep learning models have been introduced for understanding natural language queries about visualizations (Chaudhry et al., 2020; Singh and Shekhar, 2020; Reddy et al., 2019).

large language model, machine learning, question answering, (18 more...)

arXiv.org Artificial Intelligence

Apr-1-2024

arXiv.org PDF

Add feedback

Country:
- Africa (0.93)
- Asia > China (0.28)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine > Health Care Technology
  - Medical Record (0.86)
- Leisure & Entertainment > Games
  - Computer Games (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.88)
  - Natural Language
    - Large Language Model (0.72)
    - Question Answering (0.92)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found