Goto

Collaborating Authors

 tablerag


TableRAG: Million-Token Table Understanding with Language Models Si-An Chen

Neural Information Processing Systems

This enables more efficient data encoding and precise retrieval, significantly reducing prompt lengths and mitigating information loss. We have developed two new million-token benchmarks from the Arcade and BIRD-SQL datasets to thoroughly evaluate TableRAG's effectiveness at scale.


TableRAG: Million-Token Table Understanding with Language Models

Neural Information Processing Systems

Recent advancements in language models (LMs) have notably enhanced their ability to reason with tabular data, primarily through program-aided mechanisms that manipulate and analyze tables.However, these methods often require the entire table as input, leading to scalability challenges due to the positional bias or context length constraints.In response to these challenges, we introduce TableRAG, a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.This enables more efficient data encoding and precise retrieval, significantly reducing prompt lengths and mitigating information loss.We have developed two new million-token benchmarks from the Arcade and BIRD-SQL datasets to thoroughly evaluate TableRAG's effectiveness at scale.Our results demonstrate that TableRAG's retrieval design achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.


TableRAG: Million-Token Table Understanding with Language Models Si-An Chen

Neural Information Processing Systems

This enables more efficient data encoding and precise retrieval, significantly reducing prompt lengths and mitigating information loss. We have developed two new million-token benchmarks from the Arcade and BIRD-SQL datasets to thoroughly evaluate TableRAG's effectiveness at scale.


TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning

Yu, Xiaohan, Jian, Pu, Chen, Chong

arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) has demonstrated considerable effectiveness in open-domain question answering. However, when applied to heterogeneous documents, comprising both textual and tabular components, existing RAG approaches exhibit critical limitations. The prevailing practice of flattening tables and chunking strategies disrupts the intrinsic tabular structure, leads to information loss, and undermines the reasoning capabilities of LLMs in multi-hop, global queries. To address these challenges, we propose TableRAG, an SQL-based framework that unifies textual understanding and complex manipulations over tabular data. TableRAG iteratively operates in four steps: context-sensitive query decomposition, text retrieval, SQL programming and execution, and compositional intermediate answer generation. We also develop HeteQA, a novel benchmark designed to evaluate the multi-hop heterogeneous reasoning capabilities. Experimental results demonstrate that TableRAG consistently outperforms existing baselines on both public datasets and our HeteQA, establishing a new state-of-the-art for heterogeneous document question answering. We release TableRAG at https://github.com/yxh-y/TableRAG/tree/main.


TableRAG: Million-Token Table Understanding with Language Models

Neural Information Processing Systems

Recent advancements in language models (LMs) have notably enhanced their ability to reason with tabular data, primarily through program-aided mechanisms that manipulate and analyze tables.However, these methods often require the entire table as input, leading to scalability challenges due to the positional bias or context length constraints.In response to these challenges, we introduce TableRAG, a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.This enables more efficient data encoding and precise retrieval, significantly reducing prompt lengths and mitigating information loss.We have developed two new million-token benchmarks from the Arcade and BIRD-SQL datasets to thoroughly evaluate TableRAG's effectiveness at scale.Our results demonstrate that TableRAG's retrieval design achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.


TableRAG: Million-Token Table Understanding with Language Models

Chen, Si-An, Miculicich, Lesly, Eisenschlos, Julian Martin, Wang, Zifeng, Wang, Zilong, Chen, Yanfei, Fujii, Yasuhisa, Lin, Hsuan-Tien, Lee, Chen-Yu, Pfister, Tomas

arXiv.org Artificial Intelligence

Recent advancements in language models (LMs) have notably enhanced their ability to reason with tabular data, primarily through program-aided mechanisms that manipulate and analyze tables. However, these methods often require the entire table as input, leading to scalability challenges due to the positional bias or context length constraints. In response to these challenges, we introduce TableRAG, a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs. This enables more efficient data encoding and precise retrieval, significantly reducing prompt lengths and mitigating information loss. We have developed two new million-token benchmarks from the Arcade and BIRD-SQL datasets to thoroughly evaluate TableRAG's effectiveness at scale. Our results demonstrate that TableRAG's retrieval design achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding. The implementation and dataset will be available at https://github.com/