LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL
Pihulski, Dzmitry, Charchut, Karol, Novogrodskaia, Viktoria, Kocoń, Jan
–arXiv.org Artificial Intelligence
Converting natural language questions into SQL queries enables non-expert users to interact with relational databases and has long been a central task for natural language interfaces to data. While the WikiSQL dataset played a key role in early text-to-SQL research, its usage has declined due to structural and annotation issues, including case sensitivity inconsistencies, data type mismatches, syntax errors, and unanswered questions. We present LLMSQL, a systematic revision and transformation of WikiSQL designed for the large language model era. We classify these errors and implement automated methods for cleaning and re-annotation. To assess the impact of these improvements, we evaluated multiple large language models, including Gemma 3, LLaMA 3.2, Mistral 7B, gpt-oss 20B, Phi-3.5 Mini, Qwen 2.5, OpenAI o4-mini, DeepSeek-R1, and others. Notably, DeepSeek-R1 achieves 88.40% accuracy in a zero-shot setting, and models under 10B parameters surpass 90% accuracy after fine-tuning. Rather than serving as an update, LLMSQL is introduced as an LLM-ready benchmark. Unlike the original WikiSQL, which was tailored for pointer-network models selecting tokens from input, LLMSQL provides clean natural language questions and full SQL queries as plain text, enabling straightforward generation and evaluation for modern natural-language-to-SQL models.
arXiv.org Artificial Intelligence
Dec-10-2025
- Country:
- Asia > China
- Europe
- Poland > Lower Silesia Province
- Wroclaw (0.05)
- United Kingdom (0.14)
- Poland > Lower Silesia Province
- North America
- Canada (0.04)
- United States
- California > Alameda County
- Berkeley (0.14)
- Illinois > Cook County
- Chicago (0.04)
- New York (0.05)
- Texas > Harris County
- Houston (0.04)
- California > Alameda County
- Genre:
- Research Report (1.00)
- Technology: