Semantic Evaluation for Text-to-SQL with Distilled Test Suites

Oct-6-2020–arXiv.org Artificial Intelligence

We propose test suite accuracy to approximate semantic accuracy for Text-to-SQL models. Our method distills a small test suite of databases that achieves high code coverage for the gold query from a large number of randomly generated databases. At evaluation time, it computes the denotation accuracy of the predicted queries on the distilled test suite, hence calculating a tight upper-bound for semantic accuracy efficiently. We use our proposed method to evaluate 21 models submitted to the Spider leader board and manually verify that our method is always correct on 100 examples. In contrast, the current Spider metric leads to a 2.5% false negative rate on average and 8.1% in the worst case, indicating that test suite accuracy is needed. Our implementation, along with distilled test suites for eleven Text-to-SQL datasets, is publicly available.

artificial intelligence, machine learning, query, (17 more...)

arXiv.org Artificial Intelligence

Oct-6-2020

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - United States
    - New Jersey (0.04)
    - New York > New York County
      - New York City (0.04)
    - California
      - Los Angeles County > Long Beach (0.04)
      - Alameda County > Berkeley (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found