self-correcting ensemble chain-of-thought
SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL
Natural language interfaces to databases allow non-SQL experts to query relational databases more conveniently. Text-to-SQL, which automatically maps natural language questions to SQL queries [1, 2] has therefore emerged as an important problem, especially due to generative AI. Early Text-to-SQL systems were domain-specific with limited user interaction, often relying on rule-based approaches to parse input questions [3, 4, 5, 6]. Recent advancements have shifted towards greater domain independence by introducing supervised models trained on various cross-domain datasets [7, 8], and transformer-based models fine-tuned with built-in modules and constraints [9, 10, 11, 12]. Unlike retrieval-augmented generation (RAG) [13], which uses transformer-based language models fine-tuned on external knowledge, Text-to-SQL reduces potential hallucinations in domain-specific or knowledge-intensive tasks because the answer is from querying the database rather than being generated directly by a model. Recent developments in Text-to-SQL use large language models (LLMs) with zero-shot [14, 15] and few-shot prompting [16, 17], demonstrating that LLMs can serve as strong baselines with minimal demonstration of questions and schemas and no fine-tuning.