TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data

Huang, Xiang, Shen, Jiayu, Huang, Shanshan, Cheng, Sitao, Wang, Xiaxia, Qu, Yuzhong

Dec-27-2024–arXiv.org Artificial Intelligence

Semantic parsing, which converts natural language questions into logic forms, plays a crucial role in reasoning within structured environments. However, existing methods encounter two significant challenges: reliance on extensive manually annotated datasets and limited generalization capability to unseen examples. To tackle these issues, we propose Targeted Synthetic Data Generation (TARGA), a practical framework that dynamically generates high-relevance synthetic data without manual annotation. Starting from the pertinent entities and relations of a given question, we probe for the potential relevant queries through layer-wise expansion and cross-layer combination. Then we generate corresponding natural language questions for these constructed queries to jointly serve as the synthetic demonstrations for in-context learning. Experiments on multiple knowledge base question answering (KBQA) datasets demonstrate that TARGA, using only a 7B-parameter model, substantially outperforms existing non-fine-tuned methods that utilize close-sourced model, achieving notable improvements in F1 scores on GrailQA(+7.7) and KBQA-Agent(+12.2). Furthermore, TARGA also exhibits superior sample efficiency, robustness, and generalization capabilities under non-I.I.D. settings.

large language model, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

Dec-27-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Texas > Travis County
    - Austin (0.04)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
  - California > Santa Barbara County
    - Santa Barbara (0.04)
- Europe
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - Thailand > Bangkok
    - Bangkok (0.04)
  - China > Jiangsu Province
    - Nanjing (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Expert Systems (0.67)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.69)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found