AITopics | cell value

Collaborating Authors

cell value

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NEEDLEINATABLE: Exploring Long-Context Capability of Large Language Models towards Long-Structured Tables

Neural Information Processing SystemsJun-22-2026, 01:09:20 GMT

Processing structured tabular data, particularly large and lengthy tables, constitutes a fundamental yet challenging task for large language models (LLMs). However, existing long-context benchmarks like Needle-in-a-Haystack primarily focus on unstructured text, neglecting the challenge of diverse structured tables. Meanwhile, previous tabular benchmarks mainly consider downstream tasks that require highlevel reasoning abilities, and overlook models' underlying fine-grained perception of individual table cells, which is crucial for practical and robust LLM-based table applications. To address this gap, we introduce NEEDLEINATABLE (NIAT), a new long-context tabular benchmark that treats each table cell as a "needle" and requires models to extract the target cell based on cell locations or lookup questions. Our comprehensive evaluation of various LLMs and multimodal LLMs reveals a substantial performance gap between popular downstream tabular tasks and the simpler NIAT task, suggesting that they may rely on dataset-specific correlations or shortcuts to obtain better benchmark results but lack truly robust long-context understanding towards structured tables. Furthermore, we demonstrate that using synthesized NIAT training data can effectively improve performance on both NIAT task and downstream tabular tasks, which validates the importance of NIAT capability for LLMs' genuine table understanding ability.

benchmark, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Louisiana (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TableRAG: Million-Token Table Understanding with Language Models Si-An Chen

Neural Information Processing SystemsFeb-16-2026, 10:31:30 GMT

This enables more efficient data encoding and precise retrieval, significantly reducing prompt lengths and mitigating information loss. We have developed two new million-token benchmarks from the Arcade and BIRD-SQL datasets to thoroughly evaluate TableRAG's effectiveness at scale.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

TableRAG: Million-Token Table Understanding with Language Models Si-An Chen

Neural Information Processing SystemsOct-10-2025, 08:40:46 GMT

information, retrieval, tablerag, (16 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

An LLM Agent-Based Complex Semantic Table Annotation Approach

Geng, Yilin, Wang, Shujing, Wang, Chuan, He, Keqing, Lv, Yanfei, Wang, Ying, Feng, Zaiwen, Bai, Xiaoying

arXiv.org Artificial IntelligenceAug-19-2025

The Semantic Table Annotation (STA) task, which includes Column Type Annotation (CTA) and Cell Entity Annotation (CEA), maps table contents to ontology entities and plays important roles in various semantic applications. However, complex tables often pose challenges such as semantic loss of column names or cell values, strict ontological hierarchy requirements, homonyms, spelling errors, and abbreviations, which hinder annotation accuracy. To address these issues, this paper proposes an LLM-based agent approach for CTA and CEA. We design and implement five external tools with tailored prompts based on the ReAct framework, enabling the STA agent to dynamically select suitable annotation strategies depending on table characteristics. Experiments are conducted on the Tough Tables and BiodivTab datasets from the SemTab challenge, which contain the aforementioned challenges. Our method outperforms existing approaches across various metrics. Furthermore, by leveraging Levenshtein distance to reduce redundant annotations, we achieve a 70% reduction in time costs and a 60% reduction in LLM token usage, providing an efficient and cost-effective solution for STA.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2508.12868

Country: Asia > China > Hubei Province (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Enhancing Large Vision-Language Models with Layout Modality for Table Question Answering on Japanese Annual Securities Reports

Aida, Hayato, Takahashi, Kosuke, Omi, Takahiro

arXiv.org Artificial IntelligenceMay-26-2025

--With recent advancements in Large Language Models (LLMs) and growing interest in retrieval-augmented generation (RAG), the ability to understand table structures has become increasingly important. This is especially critical in financial domains such as securities reports, where highly accurate question answering (QA) over tables is required. However, tables exist in various formats--including HTML, images, and plain text--making it difficult to preserve and extract structural information. Therefore, multimodal LLMs are essential for robust and general-purpose table understanding. Despite their promise, current Large Vision-Language Models (L VLMs), which are major representatives of multimodal LLMs, still face challenges in accurately understanding characters and their spatial relationships within documents. In this study, we propose a method to enhance L VLM-based table understanding by incorporating in-table textual content and layout features. Experimental results demonstrate that these auxiliary modalities significantly improve performance, enabling robust interpretation of complex document layouts without relying on explicitly structured input formats. The T ableCellQA dataset--including rendered images, layout (bounding-box) annotations, and QA data--will be publicly released upon publication (license confirmation in progress).

information, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2505.17625

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

TableRAG: Million-Token Table Understanding with Language Models

Chen, Si-An, Miculicich, Lesly, Eisenschlos, Julian Martin, Wang, Zifeng, Wang, Zilong, Chen, Yanfei, Fujii, Yasuhisa, Lin, Hsuan-Tien, Lee, Chen-Yu, Pfister, Tomas

arXiv.org Artificial IntelligenceDec-26-2024

Recent advancements in language models (LMs) have notably enhanced their ability to reason with tabular data, primarily through program-aided mechanisms that manipulate and analyze tables. However, these methods often require the entire table as input, leading to scalability challenges due to the positional bias or context length constraints. In response to these challenges, we introduce TableRAG, a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs. This enables more efficient data encoding and precise retrieval, significantly reducing prompt lengths and mitigating information loss. We have developed two new million-token benchmarks from the Arcade and BIRD-SQL datasets to thoroughly evaluate TableRAG's effectiveness at scale. Our results demonstrate that TableRAG's retrieval design achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding. The implementation and dataset will be available at https://github.com/

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.04739

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark

Li, Lan, Fang, Liri, Torvik, Vetle I.

arXiv.org Artificial IntelligenceDec-12-2024

We investigate the reasoning capabilities of large language models (LLMs) for automatically generating data-cleaning workflows. To evaluate LLMs' ability to complete data-cleaning tasks, we implemented a pipeline for LLM-based Auto Data Cleaning Workflow (AutoDCWorkflow), prompting LLMs on data cleaning operations to repair three types of data quality issues: duplicates, missing values, and inconsistent data formats. Given a dirty table and a purpose (expressed as a query), this pipeline generates a minimal, clean table sufficient to address the purpose and the data cleaning workflow used to produce the table. The planning process involves three main LLM-driven components: (1) Select Target Columns: Identifies a set of target columns related to the purpose. (2) Inspect Column Quality: Assesses the data quality for each target column and generates a Data Quality Report as operation objectives. (3) Generate Operation & Arguments: Predicts the next operation and arguments based on the data quality report results. Additionally, we propose a data cleaning benchmark to evaluate the capability of LLM agents to automatically generate workflows that address data cleaning purposes of varying difficulty levels. The benchmark comprises the annotated datasets as a collection of purpose, raw table, clean table, data cleaning workflow, and answer set. In our experiments, we evaluated three LLMs that auto-generate purpose-driven data cleaning workflows. The results indicate that LLMs perform well in planning and generating data-cleaning workflows without the need for fine-tuning.

data quality, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.06724

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > Michigan (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (0.66)

Industry:

Government (0.46)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.30)

Technology:

Information Technology > Data Science > Data Quality > Data Cleaning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Enhancing Text-to-SQL Capabilities of Large Language Models via Domain Database Knowledge Injection

Ma, Xingyu, Tian, Xin, Wu, Lingxiang, Wang, Xuepeng, Tang, Xueming, Wang, Jinqiao

arXiv.org Artificial IntelligenceSep-24-2024

Text-to-SQL is a subtask in semantic parsing that has seen rapid progress with the evolution of Large Language Models (LLMs). However, LLMs face challenges due to hallucination issues and a lack of domain-specific database knowledge(such as table schema and cell values). As a result, they can make errors in generating table names, columns, and matching values to the correct columns in SQL statements. This paper introduces a method of knowledge injection to enhance LLMs' ability to understand schema contents by incorporating prior knowledge. This approach improves their performance in Text-to-SQL tasks. Experimental results show that pre-training LLMs on domain-specific database knowledge and fine-tuning them on downstream Text-to-SQL tasks significantly improves the Execution Match (EX) and Exact Match (EM) metrics across various models. This effectively reduces errors in generating column names and matching values to the columns. Furthermore, the knowledge-injected models can be applied to many downstream Text-to-SQL tasks, demonstrating the generalizability of the approach presented in this paper.

column name, knowledge, table name, (15 more...)

arXiv.org Artificial Intelligence

2409.15907

Country:

North America > United States > New York (0.04)
North America > United States > Texas (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Consumer Products & Services (0.48)
Transportation (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Interactive-T2S: Multi-Turn Interactions for Text-to-SQL with Large Language Models

Xiong, Guanming, Bao, Junwei, Jiang, Hongfei, Song, Yang, Zhao, Wen

arXiv.org Artificial IntelligenceAug-9-2024

This study explores text-to-SQL parsing by leveraging the powerful reasoning capabilities of large language models (LLMs). Despite recent advancements, existing LLM-based methods have not adequately addressed scalability, leading to inefficiencies when processing wide tables. Furthermore, current interaction-based approaches either lack a step-by-step, interpretable SQL generation process or fail to provide an efficient and universally applicable interaction design. To address these challenges, we introduce Interactive-T2S, a framework that generates SQL queries through direct interactions with databases. This framework includes four general tools that facilitate proactive and efficient information retrieval by the LLM. Additionally, we have developed detailed exemplars to demonstrate the step-wise reasoning processes within our framework. Our experiments on the BIRD-Dev dataset, employing a setting without oracle knowledge, reveal that our method achieves state-of-the-art results with only two exemplars, underscoring the effectiveness and robustness of our framework.

faculty, sql query, statistics, (15 more...)

arXiv.org Artificial Intelligence

2408.11062

Country:

North America > United States > Illinois (0.05)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
(15 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes

Khatiwada, Aamod, Kokel, Harsha, Abdelaziz, Ibrahim, Chaudhury, Subhajit, Dolby, Julian, Hassanzadeh, Oktie, Huang, Zhenhan, Pedapati, Tejaswini, Samulowitz, Horst, Srinivas, Kavitha

arXiv.org Artificial IntelligenceJun-28-2024

Enterprises have a growing need to identify relevant tables in data lakes; e.g. tables that are unionable, joinable, or subsets of each other. Tabular neural models can be helpful for such data discovery tasks. In this paper, we present TabSketchFM, a neural tabular model for data discovery over data lakes. First, we propose a novel pre-training sketch-based approach to enhance the effectiveness of data discovery techniques in neural tabular models. Second, to further finetune the pretrained model for several downstream tasks, we develop LakeBench, a collection of 8 benchmarks to help with different data discovery tasks such as finding tasks that are unionable, joinable, or subsets of each other. We then show on these finetuning tasks that TabSketchFM achieves state-of-the art performance compared to existing neural models. Third, we use these finetuned models to search for tables that are unionable, joinable, or can be subsets of each other. Our results demonstrate improvements in F1 scores for search compared to state-of-the-art techniques (even up to 70% improvement in a joinable search benchmark). Finally, we show significant transfer across datasets and tasks establishing that our model can generalize across different tasks over different data lakes

benchmark, dataset, tabsketchfm, (13 more...)

arXiv.org Artificial Intelligence

2407.01619

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Slovakia (0.04)
(22 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Government (0.93)
Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback