VersionRAG: Version-Aware Retrieval-Augmented Generation for Evolving Documents
Huwiler, Daniel, Stockinger, Kurt, Fürst, Jonathan
–arXiv.org Artificial Intelligence
Retrieval-Augmented Generation (RAG) systems fail when documents evolve through versioning-a ubiquitous characteristic of technical documentation. Existing approaches achieve only 58-64% accuracy on version-sensitive questions, retrieving semantically similar content without temporal validity checks. We present VersionRAG, a version-aware RAG framework that explicitly models document evolution through a hierarchical graph structure capturing version sequences, content boundaries, and changes between document states. During retrieval, VersionRAG routes queries through specialized paths based on intent classification, enabling precise version-aware filtering and change tracking. On our VersionQA benchmark-100 manually curated questions across 34 versioned technical documents-VersionRAG achieves 90% accuracy, outperforming naive RAG (58%) and GraphRAG (64%). VersionRAG reaches 60% accuracy on implicit change detection where baselines fail (0-10%), demonstrating its ability to track undocumented modifications. Additionally, VersionRAG requires 97% fewer tokens during indexing than GraphRAG, making it practical for large-scale deployment. Our work establishes versioned document QA as a distinct task and provides both a solution and benchmark for future research.
arXiv.org Artificial Intelligence
Oct-10-2025
- Country:
- Asia
- Europe > Switzerland
- North America
- Mexico > Mexico City
- Mexico City (0.04)
- United States > Florida
- Miami-Dade County > Miami (0.05)
- Mexico > Mexico City
- Genre:
- Research Report (0.82)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: