Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data Annotation
Tian, Yuan, Lee, Daniel, Wu, Fei, Mai, Tung, Qian, Kun, Sahai, Siddhartha, Zhang, Tianyi, Li, Yunyao
–arXiv.org Artificial Intelligence
Text-to-SQL models, which parse natural language (NL) questions to executable SQL queries, are increasingly adopted in real-world applications. However, deploying such models in the real world often requires adapting them to the highly specialized database schemas used in specific applications. We find that existing text-to-SQL models experience significant performance drops when applied to new schemas, primarily due to the lack of domain-specific data for fine-tuning. This data scarcity also limits the ability to effectively evaluate model performance in new domains. Continuously obtaining high-quality text-to-SQL data for evolving schemas is prohibitively expensive in real-world scenarios. To bridge this gap, we propose SQLsynth, a human-in-the-loop text-to-SQL data annotation system. SQLsynth streamlines the creation of high-quality text-to-SQL datasets through human-LLM collaboration in a structured workflow. A within-subjects user study comparing SQLsynth with manual annotation and ChatGPT shows that SQLsynth significantly accelerates text-to-SQL data annotation, reduces cognitive load, and produces datasets that are more accurate, natural, and diverse. Our code is available at https://github.com/adobe/nl_sql_analyzer.
arXiv.org Artificial Intelligence
Feb-21-2025
- Country:
- Oceania > Australia
- Victoria > Melbourne (0.04)
- New South Wales > Sydney (0.04)
- North America
- Dominican Republic (0.04)
- United States
- Utah (0.04)
- Pennsylvania (0.04)
- Indiana > Tippecanoe County
- West Lafayette (0.04)
- Lafayette (0.04)
- Arizona > Maricopa County
- Scottsdale (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Texas > Brazos County
- College Station (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Washington > King County
- Seattle (0.04)
- California
- Santa Clara County > San Jose (0.04)
- Los Angeles County > Santa Monica (0.04)
- New York > New York County
- New York City (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- Canada > British Columbia
- Europe
- Germany > Berlin (0.04)
- United Kingdom > Scotland
- City of Edinburgh > Edinburgh (0.04)
- Switzerland > Geneva
- Geneva (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Italy
- France > Occitanie
- Haute-Garonne > Toulouse (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Oceania > Australia
- Genre:
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.93)
- Workflow (0.86)
- Industry:
- Government > Regional Government (0.46)
- Education (0.46)
- Technology: