Multimodal Tabular Reasoning with Privileged Structured Information

Jun-13-2026, 11:07:07 GMT–Neural Information Processing Systems

Tabular reasoning requires complex, multi-step information extraction and logical inference, such as aggregation, comparison, or calculation over tabular data. While recent advances have leveraged large language models (LLMs) for reasoning over structured text tables, such high-quality textual representations are often unavailable in real-world settings, where tables typically appear as images. In this paper, we tackle the task of tabular reasoning directly from table images. Our core strategy is to leverage privileged structured information---specifically, the ground-truth structured table data available during training but inaccessible at test time---to enhance multimodal large language models (MLLMs). The key challenges lie in: accurately aligning visual representations with the structured information, particularly mapping the visual evidence to logical steps; and effectively transferring the reasoning skills learned during training to the MLLM for visual inference. To address these, we introduce {\sc Turbo} (TabUlar Reasoning with Bridged infOrmation), a new framework for multimodal tabular reasoning using privileged information.

artificial intelligence, large language model, natural language, (9 more...)

Neural Information Processing Systems

Jun-13-2026, 11:07:07 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)