SWE-SQL: Illuminating LLMPathways to Solve User SQLIssues in Real-World Applications
–Neural Information Processing Systems
Resolution of complex SQL issues persists as a significant bottleneck in realworld database applications. Current Large Language Models (LLMs), while adept at text-to-SQL translation, have not been rigorously evaluated on the more challenging task of debugging on SQL issues. In order to address this gap, we introduce BIRD-CRITIC, a new SQL issue debugging benchmark comprising 530 carefully curated PostgreSQL tasks (BIRD-CRITIC-PG) and 570 multi-dialect tasks (BIRD-CRITIC-MULTI), which are distilled from authentic user issues and replayed within new environments to facilitate rigorous and contamination-free evaluation. Baseline evaluations on BIRD-CRITIC underscore the task's complexity, with the leading reasoning model O3-MINI achieving only 38.87% success rate on BIRD-CRITIC-PG and 33.33% on BIRD-CRITIC-MULTI. Meanwhile, realizing open-source models for database tasks is crucial which can empower local development while safeguarding data privacy.
Neural Information Processing Systems
Jun-19-2026, 13:59:39 GMT
- Country:
- North America > United States (0.93)
- Genre:
- Overview (0.92)
- Instructional Material (0.67)
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Industry:
- Education (1.00)
- Information Technology > Security & Privacy (0.68)
- Technology: