AITopics | structure recognition

Collaborating Authors

structure recognition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Table2LaTeX-RL: High-Fidelity LaTeXCode Generation from Table Images via Reinforced Multimodal Language Models

Neural Information Processing SystemsJun-17-2026, 17:04:42 GMT

In this work, we address the task of table image to LaTeX code generation, with the goal of automating the reconstruction of high-quality, publication-ready tables from visual inputs. A central challenge of this task lies in accurately handling complex tables--those with large sizes, deeply nested structures, and semantically rich or irregular cell content--where existing methods often fail. We begin with a comprehensive analysis, identifying key challenges and highlighting the limitations of current evaluation protocols. To overcome these issues, we propose a reinforced multimodal large language model (MLLM) framework, where a pre-trained MLLM is fine-tuned on a large-scale table-to-LaTeX dataset. To further improve generation quality, we introduce a dual-reward reinforcement learning strategy based on Group Relative Policy Optimization (GRPO). Unlike standard approaches that optimize purely over text outputs, our method incorporates both a structure-level reward on LaTeX code and a visual fidelity reward computed from rendered outputs, enabling direct optimization of the visual output quality. We adopt a hybrid evaluation protocol combining TEDS-Structure and CW-SSIM, and show that our method achieves state-of-the-art performance, particularly on structurally complex tables, demonstrating the effectiveness and robustness of our approach.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

0d97fe65d7a1dc12a05642d9fa4cd578-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 20:55:00 GMT

In this paper, we present a novel large vision-language model, TabPedia, equipped with aconcept synergy mechanism.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Hierarchical structure understanding in complex tables with VLLMs: a benchmark and experiments

Bindini, Luca, Giovannini, Simone, Marinai, Simone, Nardoni, Valeria, Ali, Kimiya Noor

arXiv.org Artificial IntelligenceNov-12-2025

This work investigates the ability of Vision Large Language Models (VLLMs) to understand and interpret the structure of tables in scientific articles. Specifically, we explore whether VLLMs can infer the hierarchical structure of tables without additional processing. As a basis for our experiments we use the PubTables-1M dataset, a large-scale corpus of scientific tables. From this dataset, we extract a subset of tables that we introduce as Complex Hierarchical Tables (CHiTab): a benchmark collection of complex tables containing hierarchical headings. We adopt a series of prompt engineering strategies to probe the models' comprehension capabilities, experimenting with various prompt formats and writing styles. Multiple state-of-the-art open-weights VLLMs are evaluated on the benchmark first using their off-the-shelf versions and then fine-tuning some models on our task. We also measure the performance of humans to solve the task on a small set of tables comparing with performance of the evaluated VLLMs. The experiments support our intuition that generic VLLMs, not explicitly designed for understanding the structure of tables, can perform this task. This study provides insights into the potential and limitations of VLLMs to process complex tables and offers guidance for future work on integrating structured data understanding into general-purpose VLLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.08298

Country:

North America > United States (0.46)
Europe (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Link prediction Graph Neural Networks for structure recognition of Handwritten Mathematical Expressions

Nguyen, Cuong Tuan, Nguyen, Ngoc Tuan, Dao, Triet Hoang Minh, Nhat, Huy Minh, Dinh, Huy Truong

arXiv.org Artificial IntelligenceNov-5-2025

We propose a Graph Neural Network (GNN)-based approach for Handwritten Mathematical Expression (HME) recognition by modeling HMEs as graphs, where nodes represent symbols and edges capture spatial dependencies. A deep BLSTM network is used for symbol segmentation, recognition, and spatial relation classification, forming an initial primitive graph. A 2D-CFG parser then generates all possible spatial relations, while the GNN-based link prediction model refines the structure by removing unnecessary connections, ultimately forming the Symbol Label Graph. Experimental results demonstrate the effectiveness of our approach, showing promising performance in HME structure recognition.

artificial intelligence, machine learning, recognition, (14 more...)

arXiv.org Artificial Intelligence

2511.02288

Country: Asia (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TABLET: Table Structure Recognition using Encoder-only Transformers

Hou, Qiyu, Wang, Jun

arXiv.org Artificial IntelligenceOct-20-2025

To address the challenges of table structure recognition, we propose a novel Split-Merge-based top-down model optimized for large, densely populated tables. Our approach formulates row and column splitting as sequence labeling tasks, utilizing dual Transformer encoders to capture feature interactions. The merging process is framed as a grid cell classification task, leveraging an additional Transformer encoder to ensure accurate and coherent merging. By eliminating unstable bounding box predictions, our method reduces resolution loss and computational complexity, achieving high accuracy while maintaining fast processing speed. Extensive experiments on FinTabNet and PubTabNet demonstrate the superiority of our model over existing approaches, particularly in real-world applications. Our method offers a robust, scalable, and efficient solution for large-scale table recognition, making it well-suited for industrial deployment.

machine learning, pattern recognition, recognition, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-032-04630-7_15

2506.07015

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

Neural Information Processing SystemsOct-9-2025, 18:29:14 GMT

In this paper, we present a novel large vision-language model, TabPedia, equipped with a concept synergy mechanism. In this mechanism, all the involved diverse visual table understanding (VTU) tasks and multi-source visual embeddings are abstracted as concepts.

international conference, proceedings, tabpedia, (15 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
Asia > China (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Table2LaTeX-RL: High-Fidelity LaTeX Code Generation from Table Images via Reinforced Multimodal Language Models

Ling, Jun, Qi, Yao, Huang, Tao, Zhou, Shibo, Huang, Yanqin, Yang, Jiang, Song, Ziqi, Zhou, Ying, Yang, Yang, Shen, Heng Tao, Wang, Peng

arXiv.org Artificial IntelligenceSep-23-2025

In this work, we address the task of table image to LaTeX code generation, with the goal of automating the reconstruction of high-quality, publication-ready tables from visual inputs. A central challenge of this task lies in accurately handling complex tables -- those with large sizes, deeply nested structures, and semantically rich or irregular cell content -- where existing methods often fail. We begin with a comprehensive analysis, identifying key challenges and highlighting the limitations of current evaluation protocols. To overcome these issues, we propose a reinforced multimodal large language model (MLLM) framework, where a pre-trained MLLM is fine-tuned on a large-scale table-to-LaTeX dataset. To further improve generation quality, we introduce a dual-reward reinforcement learning strategy based on Group Relative Policy Optimization (GRPO). Unlike standard approaches that optimize purely over text outputs, our method incorporates both a structure-level reward on LaTeX code and a visual fidelity reward computed from rendered outputs, enabling direct optimization of the visual output quality. We adopt a hybrid evaluation protocol combining TEDS-Structure and CW-SSIM, and show that our method achieves state-of-the-art performance, particularly on structurally complex tables, demonstrating the effectiveness and robustness of our approach.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2509.17589

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.61)

Add feedback

Synthetic Data Augmentation for Table Detection: Re-evaluating TableNet's Performance with Automatically Generated Document Images

Sahukara, Krishna, Bettouche, Zineddine, Fischer, Andreas

arXiv.org Artificial IntelligenceJun-18-2025

Document pages captured by smartphones or scanners often contain tables, yet manual extraction is slow and error-prone. We introduce an automated LaTeX-based pipeline that synthesizes realistic two-column pages with visually diverse table layouts and aligned ground-truth masks. The generated corpus augments the real-world Marmot benchmark and enables a systematic resolution study of TableNet. Training TableNet on our synthetic data achieves a pixel-wise XOR error of 4.04% on our synthetic test set with a 256x256 input resolution, and 4.33% with 1024x1024. The best performance on the Marmot benchmark is 9.18% (at 256x256), while cutting manual annotation effort through automation.

machine learning, natural language, table detection, (15 more...)

arXiv.org Artificial Intelligence

2506.14583

Country: Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
(2 more...)

Add feedback

RAPTOR: Refined Approach for Product Table Object Recognition

Thomas, Eliott, Coustaty, Mickael, Joseph, Aurelie, Deloin, Gaspar, Carel, Elodie, D'Andecy, Vincent Poulain, Ogier, Jean-Marc

arXiv.org Artificial IntelligenceFeb-24-2025

Extracting tables from documents is a critical task across various industries, especially on business documents like invoices and reports. Existing systems based on DEtection TRansformer (DETR) such as TAble TRansformer (TATR), offer solutions for T able Detection (TD) and T able Structure Recognition (TSR) but face challenges with diverse table formats and common errors like incorrect area detection and overlapping columns. This research introduces RAPTOR, a modular post-processing system designed to enhance state-of-the-art models for improved table extraction, particularly for product tables. RAPTOR, addresses recurrent TD and TSR issues, improving both precision and structural predictions. F or TD, we use DETR (trained on ICDAR 2019) and TATR (trained on PubT ables-1M and FinT abNet), while TSR only relies on TATR. A Genetic Algorithm is incorporated to optimize RAPTOR's module parameters, using a private dataset of product tables to align with industrial needs. W e evaluate our method on two private datasets of product tables, the public DOCILE dataset (which contains tables similar to our target product tables), and the ICDAR 2013 and ICDAR 2019 datasets. The results demonstrate that while our approach excels at product tables, it also maintains reasonable performance across diverse table formats.

corpusid, dataset, semanticscholar, (14 more...)

arXiv.org Artificial Intelligence

2502.14918

Country: Europe > France (0.05)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner

Zhou, Yitong, Cheng, Mingyue, Mao, Qingyang, Liu, Qi, Xu, Feiyang, Li, Xin, Chen, Enhong

arXiv.org Artificial IntelligenceJan-3-2025

Pre-trained foundation models have recently significantly progressed in structured table understanding and reasoning. However, despite advancements in areas such as table semantic understanding and table question answering, recognizing the structure and content of unstructured tables using Vision Large Language Models (VLLMs) remains under-explored. In this work, we address this research gap by employing VLLMs in a training-free reasoning paradigm. First, we design a benchmark with various hierarchical dimensions relevant to table recognition. Subsequently, we conduct in-depth evaluations using pre-trained VLLMs, finding that low-quality image input is a significant bottleneck in the recognition process. Drawing inspiration from these findings, we propose the Neighbor-Guided Toolchain Reasoner (NGTR) framework, which is characterized by integrating multiple lightweight models for low-level visual processing operations aimed at mitigating issues with low-quality input images. Specifically, we utilize a neighbor retrieval mechanism to guide the generation of multiple tool invocation plans, transferring tool selection experiences from similar neighbors to the given input, thereby facilitating suitable tool selection. Additionally, we introduce a reflection module to supervise the tool invocation process. Extensive experiments on public table recognition datasets demonstrate that our approach significantly enhances the recognition capabilities of the vanilla VLLMs. We believe that the designed benchmark and the proposed NGTR framework could provide an alternative solution in table recognition.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.20662

Country: Asia (0.68)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback