Goto

Collaborating Authors

 cosql


Conversational Text-to-SQL: An Odyssey into State-of-the-Art and Challenges Ahead

arXiv.org Artificial Intelligence

We Text-to-SQL is an important research topic in semantic parsing adapt the two reranking methods from [16], query plan (QP) and [1, 2, 3, 4, 5, 6, 7]. Spider [3] and CoSQL [5] datasets allow for schema linking (SL), and show that both methods can help improve making progress in complex, cross-domain, single and multi-turn multi-turn text-to-SQL. With accuracy on CoSQL being reported text-to-SQL tasks respectively, utilizing a common set of databases, using exact-set-match accuracy (EM) and execution accuracy (EX), with competitive leaderboards, demonstrating the difficulty in the with T5-Large we observed: a) MT leads to 2.4% and 1.7% absolute tasks. In contrast to Spider, CoSQL was collected as entire dialogues, improvement on EM and EX; b) combined reranking approaches and hence includes additional challenges for the text-to-SQL yield 1.9% and 2.2% improvements; c) combining MT with reranking, task in terms of integrating dialogue context. In addition to the with T5-Large we obtain improvements of 2.1% in EM and challenges in general-purpose code generation [8, 9], where the 3.7% in EX over a T5-Large PICARD baseline. This improvement output of the system is constrained to follow a grammar, the textto-SQL is consistent on larger models, using T5-3B yielded about 1.0% in problem is underspecified without a schema.


RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL

arXiv.org Artificial Intelligence

Relational structures such as schema linking and schema encoding have been validated as a key component to qualitatively translating natural language into SQL queries. However, introducing these structural relations comes with prices: they often result in a specialized model structure, which largely prohibits using large pretrained models in text-to-SQL. To address this problem, we propose RASAT: a Transformer seq2seq architecture augmented with relation-aware self-attention that could leverage a variety of relational structures while inheriting the pretrained parameters from the T5 model effectively. Our model can incorporate almost all types of existing relations in the literature, and in addition, we propose introducing co-reference relations for the multi-turn scenario. Experimental results on three widely used text-to-SQL datasets, covering both single-turn and multi-turn scenarios, have shown that RASAT could achieve state-of-the-art results across all three benchmarks (75.5% EX on Spider, 52.6% IEX on SParC, and 37.4% IEX on CoSQL).


Grounded Adaptation for Zero-shot Executable Semantic Parsing

arXiv.org Artificial Intelligence

We propose Grounded Adaptation for Zero-shot Executable Semantic Parsing (GAZP) to adapt an existing semantic parser to new environments (e.g. new database schemas). GAZP combines a forward semantic parser with a backward utterance generator to synthesize data (e.g. utterances and SQL queries) in the new environment, then selects cycle-consistent examples to adapt the parser. Unlike data-augmentation, which typically synthesizes unverified examples in the training environment, GAZP synthesizes examples in the new environment whose input-output consistency are verified. On the Spider, Sparc, and CoSQL zero-shot semantic parsing tasks, GAZP improves logical form and execution accuracy of the baseline parser. Our analyses show that GAZP outperforms data-augmentation in the training environment, performance increases with the amount of GAZP-synthesized data, and cycle-consistency is central to successful adaptation.


CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases

arXiv.org Artificial Intelligence

It consists of 30k turns plus 10k annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets: (1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot-value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https:// yale-lily.github.io/cosql .