RubikSQL: Lifelong Learning Agentic Knowledge Base as an Industrial NL2SQL System
Chen, Zui, Li, Han, Zhang, Xinhao, Chen, Xiaoyu, Dong, Chunyin, Wang, Yifeng, Cai, Xin, Zhang, Su, Li, Ziqi, Ding, Chi, Li, Jinxu, Wang, Shuai, Zhao, Dousheng, Gao, Sanhai, Liu, Guangyi
–arXiv.org Artificial Intelligence
We present RubikSQL, a novel NL2SQL system designed to address key challenges in real-world enterprise-level NL2SQL, such as implicit intents and domain-specific terminology. RubikSQL frames NL2SQL as a lifelong learning task, demanding both Knowledge Base (KB) maintenance and SQL generation. RubikSQL systematically builds and refines its KB through techniques including database profiling, structured information extraction, agentic rule mining, and Chain-of-Thought (CoT)-enhanced SQL profiling. RubikSQL then employs a multi-agent workflow to leverage this curated KB, generating accurate SQLs. RubikSQL achieves SOTA performance on both the KaggleDBQA and BIRD Mini-Dev datasets. Finally, we release the RubikBench benchmark, a new benchmark specifically designed to capture vital traits of industrial NL2SQL scenarios, providing a valuable resource for future research.
arXiv.org Artificial Intelligence
Aug-26-2025
- Country:
- Africa
- Cameroon > Gulf of Guinea (0.04)
- Middle East > Tunisia
- Tunis Governorate > Tunis (0.04)
- Asia
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Germany (0.04)
- Spain (0.04)
- Sweden (0.04)
- Western Europe (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States > Louisiana
- Orleans Parish > New Orleans (0.04)
- Africa
- Genre:
- Instructional Material (0.71)
- Research Report (0.50)
- Workflow (0.48)
- Industry:
- Technology:
- Information Technology
- Artificial Intelligence
- Cognitive Science > Problem Solving (0.68)
- Machine Learning > Neural Networks
- Deep Learning (0.46)
- Natural Language
- Information Retrieval (0.67)
- Large Language Model (1.00)
- Representation & Reasoning
- Agents (0.66)
- Expert Systems (0.71)
- Data Science > Data Mining (1.00)
- Artificial Intelligence
- Information Technology