ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Format Restriction, and Column Exploration
Deng, Minghang, Ramachandran, Ashwin, Xu, Canwen, Hu, Lanxiang, Yao, Zhewei, Datta, Anupam, Zhang, Hao
–arXiv.org Artificial Intelligence
Text-to-SQL systems have unlocked easier access to critical data insights by enabling natural language queries over structured databases. However, deploying such systems in enterprise environments remains challenging due to factors such as large, complex schemas (> 3000 columns), diverse SQL dialects (e.g., BigQuery, Snowflake) and sophisticated query requirements (e.g., transformation, analytics). Current state-of-the-art performance on the Spider 2.0 dataset -- a benchmark built to mimic such complex environments -- remains limited at 20%. Key limitations include inadequate instruction-following, poor long-context comprehension, weak self-refinement, and insufficient dialect-specific knowledge. To address these gaps, we propose ReFoRCE (Self-Refinement Agent with Format Restriction and Column Exploration) which introduces (1) table compression to mitigate longcontext limitations (2) format restriction to ensure accurate answer format, and (3) iterative column exploration for enhanced schema understanding. Additionally, it employs self-refinement pipeline consisting of (1) parallelized workflows with voting mechanisms and (2) a Common Table Expression (CTE) based refinement approach to handle unresolved cases. ReFoRCE achieves state-of-the-art results scoring 31.26 on the Spider 2.0-Snow and scoring 30.35 on the Spider 2.0-Lite tasks. Text-to-SQL converts natural language queries into SQL queries, serving as a key technology for lowering the barrier to accessing relational databases (Zelle & Mooney, 1996; Zettlemoyer & Collins, 2012; Zhong et al., 2017; Yu et al., 2018; Wang et al., 2019; Gao et al., 2023a; Lei et al., 2024).
arXiv.org Artificial Intelligence
Feb-14-2025
- Country:
- Asia (0.28)
- North America > United States (0.28)
- Genre:
- Research Report (0.64)
- Workflow (0.69)
- Industry:
- Health & Medicine > Therapeutic Area (0.47)
- Technology: