TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data
Huang, Xiang, Shen, Jiayu, Huang, Shanshan, Cheng, Sitao, Wang, Xiaxia, Qu, Yuzhong
–arXiv.org Artificial Intelligence
Semantic parsing, which converts natural language questions into logic forms, plays a crucial role in reasoning within structured environments. However, existing methods encounter two significant challenges: reliance on extensive manually annotated datasets and limited generalization capability to unseen examples. To tackle these issues, we propose Targeted Synthetic Data Generation (TARGA), a practical framework that dynamically generates high-relevance synthetic data without manual annotation. Starting from the pertinent entities and relations of a given question, we probe for the potential relevant queries through layer-wise expansion and cross-layer combination. Then we generate corresponding natural language questions for these constructed queries to jointly serve as the synthetic demonstrations for in-context learning. Experiments on multiple knowledge base question answering (KBQA) datasets demonstrate that TARGA, using only a 7B-parameter model, substantially outperforms existing non-fine-tuned methods that utilize close-sourced model, achieving notable improvements in F1 scores on GrailQA(+7.7) and KBQA-Agent(+12.2). Furthermore, TARGA also exhibits superior sample efficiency, robustness, and generalization capabilities under non-I.I.D. settings.
arXiv.org Artificial Intelligence
Dec-27-2024
- Country:
- North America > United States
- New York (0.04)
- Texas > Travis County
- Austin (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Santa Barbara County
- Santa Barbara (0.04)
- Europe
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- United Kingdom > England
- Asia
- North America > United States
- Genre:
- Research Report (0.50)
- Technology: