Structured Relevance Assessment for Robust Retrieval-Augmented Language Models
Raj, Aryan, Garg, Astitva Veer, D, Anitha
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation tasks, revolutionizing how machines interact with human language. Despite their impressive performance, these models continue to struggle with factual accuracy, often producing content that appears plausible but contains incorrect information--a phenomenon commonly referred to as "hallucination"[1]. However, despite their conceptual elegance, RALMs face several critical challenges that undermine their effectiveness in real-world scenarios. First, these systems often struggle to distinguish between relevant and irrelevant retrieved documents, treating all retrievals with equal importance regardless of their actual utility for answering the query at hand. Second, standard RALMs frequently over-rely on external retrievals even in situations where their intrinsic knowledge would be sufficient or more reliable. This rigid dependence on external sources fails to leverage the substantial knowledge already encoded in model parameters during pre-training and fine-tuning. Perhaps most concerning is RALMs' inability to acknowledge knowledge gaps when confronted with queries that cannot be answered based on either retrieved information or intrinsic knowledge. Instead of transparently communicating limitations--a crucial capability for trustworthy AI systems--these models often generate fabricated responses that appear authoritative despite lacking factual foundation.
arXiv.org Artificial Intelligence
Jul-30-2025