AITopics

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Louisiana (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsJun-15-2026, 20:30:30 GMT

Table as a Modality for Large Language Models

To migrate the remarkable successes of Large Language Models (LLMs), the community has made numerous efforts to generalize them to the table reasoning tasks for the widely deployed tabular data. Despite that, in this work, by showing a probing experiment on our proposed StructQA benchmark, we postulate that even the most advanced LLMs (such as GPTs) may still fall short of coping with tabular data. More specifically, the current scheme often simply relies on serializing the tabular data, together with the meta information, then inputting them through the LLMs. We argue that the loss of structural information is the root of this shortcoming. In this work, we further propose TAMO, which bears an ideology to treat the tables as an independent modality integrated with the text tokens. The resulting model in TAMO is a multimodal framework consisting of a hypergraph neural network as the global table encoder seamlessly integrated with the mainstream LLM. Empirical results on various benchmarking datasets, including HiTab, WikiTQ, WikiSQL, FeTaQA, and StructQA, have demonstrated significant improvements on generalization with an average relative gain of 42.65%.

large language model, machine learning, natural language, (19 more...)

Country:

Asia (0.92)
North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Neural Information Processing SystemsFeb-12-2026, 22:32:09 GMT

Hy pergraph-enhanced Tabular Data Representation Learning Pei Chen

Tabular data that is organized in bi-dimensional matrices are widespread in webpages, documents, and databases.

data mining, machine learning, natural language, (19 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
(5 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsFeb-7-2026, 20:55:00 GMT

0d97fe65d7a1dc12a05642d9fa4cd578-Paper-Conference.pdf

In this paper, we present a novel large vision-language model, TabPedia, equipped with aconcept synergy mechanism.

artificial intelligence, machine learning, natural language, (19 more...)

Country: Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceNov-18-2025

MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns

Zhang, Jiarui, Liu, Yuliang, Wu, Zijun, Pang, Guosheng, Ye, Zhili, Zhong, Yupei, Ma, Junteng, Wei, Tao, Xu, Haiyang, Chen, Weikai, Wang, Zeen, Ji, Qiangjun, Zhou, Fanxi, Zhang, Qi, Hu, Yuanrui, Liu, Jiahao, Li, Zhang, Zhang, Ziyang, Liu, Qiang, Bai, Xiang

Document parsing is a core task in document intelligence, supporting applications such as information extraction, retrieval-augmented generation, and automated document analysis. However, real-world documents often feature complex layouts with multi-level tables, embedded images or formulas, and cross-page structures, which remain challenging for existing OCR systems. We introduce MonkeyOCR v1.5, a unified vision-language framework that enhances both layout understanding and content recognition through a two-stage pipeline. The first stage employs a large multimodal model to jointly predict layout and reading order, leveraging visual information to ensure sequential consistency. The second stage performs localized recognition of text, formulas, and tables within detected regions, maintaining high visual fidelity while reducing error propagation. To address complex table structures, we propose a visual consistency-based reinforcement learning scheme that evaluates recognition quality via render-and-compare alignment, improving structural accuracy without manual annotations. Additionally, two specialized modules, Image-Decoupled Table Parsing and Type-Guided Table Merging, are introduced to enable reliable parsing of tables containing embedded images and reconstruction of tables crossing pages or columns. Comprehensive experiments on OmniDocBench v1.5 demonstrate that MonkeyOCR v1.5 achieves state-of-the-art performance, outperforming PPOCR-VL and MinerU 2.5 while showing exceptional robustness in visually complex document scenarios. A trial link can be found at https://github.com/Yuliang-Liu/MonkeyOCR .

large language model, machine learning, pattern recognition, (22 more...)

2511.1039

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
(2 more...)

arXiv.org Artificial IntelligenceOct-29-2025

NeedleInATable: Exploring Long-Context Capability of Large Language Models towards Long-Structured Tables

Wang, Lanrui, Zheng, Mingyu, Tang, Hongyin, Lin, Zheng, Cao, Yanan, Wang, Jingang, Cai, Xunliang, Wang, Weiping

Processing structured tabular data, particularly large and lengthy tables, constitutes a fundamental yet challenging task for large language models (LLMs). However, existing long-context benchmarks like Needle-in-a-Haystack primarily focus on unstructured text, neglecting the challenge of diverse structured tables. Meanwhile, previous tabular benchmarks mainly consider downstream tasks that require high-level reasoning abilities, and overlook models' underlying fine-grained perception of individual table cells, which is crucial for practical and robust LLM-based table applications. To address this gap, we introduce \textsc{NeedleInATable} (NIAT), a new long-context tabular benchmark that treats each table cell as a ``needle'' and requires models to extract the target cell based on cell locations or lookup questions. Our comprehensive evaluation of various LLMs and multimodal LLMs reveals a substantial performance gap between popular downstream tabular tasks and the simpler NIAT task, suggesting that they may rely on dataset-specific correlations or shortcuts to obtain better benchmark results but lack truly robust long-context understanding towards structured tables. Furthermore, we demonstrate that using synthesized NIAT training data can effectively improve performance on both NIAT task and downstream tabular tasks, which validates the importance of NIAT capability for LLMs' genuine table understanding ability.

large language model, machine learning, urlhttp, (20 more...)

2504.0656

Country:

Asia (0.93)
Europe (0.93)
North America > United States > Louisiana (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 23:19:52 GMT

66178beae8f12fcd48699de95acc1152-Paper-Conference.pdf

data mining, machine learning, natural language, (19 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Spain > Galicia > Madrid (0.05)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
(8 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Vallivaara, Ilari, Poikselkä, Katja, Rikula, Pauli, Röning, Juha

Systematic Alias Sampling: an efficient and low-variance way to sample from a discrete distribution

arXiv.org Artificial IntelligenceSep-30-2025

In this paper we combine the Alias method with the concept of systematic sampling, a method commonly used in particle filters for efficient low-variance resampling. The proposed method allows very fast sampling from a discrete distribution: drawing k samples is up to an order of magnitude faster than binary search from the cumulative distribution function (cdf) or inversion methods used in many libraries. The produced empirical distribution function is evaluated using a modified Cramér-Von Mises goodness-of-fit statistic, showing that the method compares very favourably to multinomial sampling. As continuous distributions can often be approximated with discrete ones, the proposed method can be used as a very general way to efficiently produce random samples for particle filter proposal distributions, e.g. for motion models in robotics.

artificial intelligence, discrete distribution, r-von mise statistic, (14 more...)

doi: 10.1145/2935745

2509.24089

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

arXiv.org Artificial IntelligenceSep-23-2025

TableEval: A Real-World Benchmark for Complex, Multilingual, and Multi-Structured Table Question Answering

Zhu, Junnan, Wang, Jingyi, Yu, Bohan, Wu, Xiaoyu, Li, Junbo, Wang, Lei, Xu, Nan

LLMs have shown impressive progress in natural language processing. However, they still face significant challenges in TableQA, where real-world complexities such as diverse table structures, multilingual data, and domain-specific reasoning are crucial. Existing TableQA benchmarks are often limited by their focus on simple flat tables and suffer from data leakage. Furthermore, most benchmarks are monolingual and fail to capture the cross-lingual and cross-domain variability in practical applications. To address these limitations, we introduce TableEval, a new benchmark designed to evaluate LLMs on realistic TableQA tasks. Specifically, TableEval includes tables with various structures (such as concise, hierarchical, and nested tables) collected from four domains (including government, finance, academia, and industry reports). Besides, TableEval features cross-lingual scenarios with tables in Simplified Chinese, Traditional Chinese, and English. To minimize the risk of data leakage, we collect all data from recent real-world documents. Considering that existing TableQA metrics fail to capture semantic accuracy, we further propose SEAT, a new evaluation framework that assesses the alignment between model responses and reference answers at the sub-question level. Experimental results have shown that SEAT achieves high agreement with human judgment. Extensive experiments on TableEval reveal critical gaps in the ability of state-of-the-art LLMs to handle these complex, real-world TableQA tasks, offering insights for future improvements. We make our dataset available here: https://github.com/wenge-research/TableEval.

computational linguistic, large language model, machine learning, (20 more...)

2506.03949

Country: Asia > China (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Rajgaria, Abhishek, Dixit, Kushagra, Vyas, Mayank, Kalalbandi, Harshavardhan, Roth, Dan, Gupta, Vivek

No Universal Prompt: Unifying Reasoning through Adaptive Prompting for Temporal Table Reasoning

arXiv.org Artificial IntelligenceAug-11-2025

Temporal Table Reasoning is a critical challenge for Large Language Models (LLMs), requiring effective reasoning to extract relevant insights. Despite existence of multiple prompting methods, their impact on table reasoning remains largely unexplored. Furthermore, model performance varies drastically across different table and context structures, making it difficult to determine an optimal approach. This work investigates multiple prompting technique on diverse table types to determine that performance depends on factors such as entity type, table structure, requirement of additional context and question complexity, with "NO" single method consistently outperforming others. To address this, we introduce SEAR, an adaptive prompting framework inspired by human reasoning that dynamically adjusts to context and integrates structured reasoning. Our results demonstrate that SEAR achieves superior performance across all table types compared to baseline prompting techniques. Additionally, we explore the impact of table structure refactoring, finding that a unified representation enhances model reasoning.

artificial intelligence, large language model, natural language, (17 more...)

2506.11246

Country:

North America > United States (1.00)
Asia (1.00)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)