MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

Dou, Longxu, Gao, Yan, Pan, Mingyang, Wang, Dingzirui, Che, Wanxiang, Zhan, Dechen, Lou, Jian-Guang

Dec-27-2022–arXiv.org Artificial Intelligence

Text-to-SQL semantic parsing is an important NLP task, which greatly facilitates the interaction between users and the database and becomes the key component in many human-computer interaction systems. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MultiSpider, the largest multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MultiSpider, we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVe (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.

artificial intelligence, natural language, schema, (15 more...)

arXiv.org Artificial Intelligence

Dec-27-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Europe > Belgium
  - Brussels-Capital Region > Brussels (0.04)
- Asia
  - Vietnam (0.04)
  - China > Heilongjiang Province
    - Harbin (0.04)

Genre:
- Research Report (0.64)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found