Q-NL Verifier: Leveraging Synthetic Data for Robust Knowledge Graph Question Answering

Schwabe, Tim, Siebel, Louisa, Valach, Patrik, Acosta, Maribel

Mar-3-2025–arXiv.org Artificial Intelligence

Question answering (QA) requires accurately aligning user questions with structured queries, a process often limited by the scarcity of high-quality query-natural language (Q-NL) pairs. To overcome this, we present Q-NL Verifier, an approach to generating high-quality synthetic pairs of queries and NL translations. Our approach relies on large language models (LLMs) to generate semantically precise natural language paraphrases of structured queries. Building on these synthetic Q-NL pairs, we introduce a learned verifier component that automatically determines whether a generated paraphrase is semantically equivalent to the original query. Our experiments with the well-known LC-QuAD 2.0 benchmark show that Q-NL Verifier generalizes well to paraphrases from other models and even human-authored translations. Our approach strongly aligns with human judgments across varying query complexities and outperforms existing NLP metrics in assessing semantic correctness. We also integrate the verifier into QA pipelines, showing that verifier-filtered synthetic data has significantly higher quality in terms of translation correctness and enhances NL to Q translation accuracy. Lastly, we release an updated version of the LC-QuAD 2.0 benchmark containing our synthetic Q-NL pairs and verifier scores, offering a new resource for robust and scalable QA.

dataset, query, translation, (15 more...)

arXiv.org Artificial Intelligence

Mar-3-2025

arXiv.org PDF

Add feedback

Country:
- South America > Brazil
  - Rio de Janeiro > Rio de Janeiro (0.04)
- Oceania > New Zealand
  - North Island > Auckland Region > Auckland (0.04)
- North America
  - Canada (0.04)
  - United States
    - Maryland > Baltimore (0.04)
    - California > Los Angeles County
      - Los Angeles (0.04)
  - Puerto Rico > Peñuelas
    - Peñuelas (0.04)
- Europe
  - Bulgaria (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Netherlands
    - South Holland > Leiden (0.04)
    - North Holland > Amsterdam (0.04)
  - Italy > Liguria
    - Genoa (0.04)
  - Greece > Attica
    - Athens (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
  - France > Auvergne-Rhône-Alpes
    - Lyon > Lyon (0.04)
- Asia > China
  - Hong Kong (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Text Processing (1.00)
    - Question Answering (1.00)
    - Machine Translation (1.00)
    - Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.97)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found