Can LLMs Narrate Tabular Data? An Evaluation Framework for Natural Language Representations of Text-to-SQL System Outputs
Singh, Jyotika, Sun, Weiyi, Agarwal, Amit, Krishnamurthy, Viji, Benajiba, Yassine, Ravi, Sujith, Roth, Dan
–arXiv.org Artificial Intelligence
In modern industry systems like multi-turn chat agents, Text-to-SQL technology bridges natural language (NL) questions and database (DB) querying. The conversion of tabular DB results into NL representations (NLRs) enables the chat-based interaction. Currently, NLR generation is typically handled by large language models (LLMs), but information loss or errors in presenting tabular results in NL remains largely unexplored. This paper introduces a novel evaluation method - Combo-Eval - for judgment of LLM-generated NLRs that combines the benefits of multiple existing methods, optimizing evaluation fidelity and achieving a significant reduction in LLM calls by 25-61%. Accompanying our method is NLR-BIRD, the first dedicated dataset for NLR benchmarking. Through human evaluations, we demonstrate the superior alignment of Combo-Eval with human judgments, applicable across scenarios with and without ground truth references.
arXiv.org Artificial Intelligence
Oct-29-2025
- Country:
- Asia
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.14)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East > UAE
- Europe
- Austria > Vienna (0.14)
- Czechia
- Central Bohemian Region (0.04)
- South Moravian Region (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- North America
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California > Los Angeles County (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Mexico > Mexico City
- Asia
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Leisure & Entertainment > Sports (0.46)
- Technology: