Learning to Refine: An Agentic RL Approach for Iterative SPARQL Query Construction
Vossebeld, Floris, Wang, Shenghui
–arXiv.org Artificial Intelligence
Generating complex, logically-sound SPARQL queries for multi-hop questions remains a critical bottleneck for Knowledge Graph Question Answering, as the brittle nature of one-shot generation by Large Language Models (LLMs) hinders reliable interaction with structured data. Current methods lack the adaptive policies needed to dynamically debug queries based on real-time execution feedback. This paper introduces a novel agentic framework where an LLM learns a resilient policy for the sequential process of iterative SPARQL construction. We show that a compact 3B-parameter model, trained exclusively via outcome-driven Reinforcement Learning (GRPO) without supervised fine-tuning, can learn effective policies for this task, discovering how to systematically recover from execution errors and refine its queries toward a correct answer. On a curated, executable single-answer subset of LC-QuAD 2.0, our agent achieves 49.7\% accuracy post-entity-linking, a significant 17.5 percentage point improvement over the strongest iterative zero-shot baseline. Further analysis reveals that while the agent's capability is driven by RL, its performance is enhanced by an explicit deliberative reasoning step that acts as a cognitive scaffold to improve policy precision. This work presents a generalizable blueprint for teaching agents to master formal, symbolic tools through interaction, bridging the gap between probabilistic LLMs and the structured world of Knowledge Graphs.
arXiv.org Artificial Intelligence
Nov-18-2025
- Country:
- Asia
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Bulgaria > Plovdiv Province
- Plovdiv (0.04)
- Germany > Berlin (0.04)
- Netherlands (0.05)
- Belgium > Brussels-Capital Region
- North America > United States
- Washington > King County > Seattle (0.04)
- Genre:
- Research Report (0.67)
- Industry:
- Leisure & Entertainment (0.46)
- Media > Film (0.46)
- Technology: