MSG-Chart: Multimodal Scene Graph for ChartQA

Dai, Yue, Han, Soyeon Caren, Liu, Wei

arXiv.org Artificial Intelligence 

Automatic Chart Question Answering (ChartQA) is challenging due to the complex distribution of chart elements with patterns of the underlying data not explicitly displayed in charts. To address this challenge, we design a joint multimodal scene graph for charts to explicitly represent the relationships between chart elements and their patterns. Our proposed multimodal scene graph includes a visual graph and a textual graph to jointly capture the structural and semantical knowledge from the chart. This graph Figure 1: Cutting-Edge LLMs and Our MSG-Chart module can be easily integrated with different vision transformers as inductive bias. Our experiments demonstrate that incorporating charts often include extensive text and numerical data, understanding the proposed graph module enhances the understanding of charts' these features is crucial for accurate question answering. While elements' structure and semantics, thereby improving performance recognizing the underlying text of an object is enough for data extraction on publicly available benchmarks, ChartQA and OpenCQA.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found