Sufficient Context: A New Lens on Retrieval Augmented Generation Systems

Joren, Hailey, Zhang, Jianyi, Ferng, Chun-Sung, Juan, Da-Cheng, Taly, Ankur, Rashtchian, Cyrus

Dec-6-2024–arXiv.org Artificial Intelligence

Augmenting LLMs with context leads to improved performance across many applications. Despite much research on Retrieval Augmented Generation (RAG) systems, an open question is whether errors arise because LLMs fail to utilize the context from retrieval or the context itself is insufficient to answer the query. To shed light on this, we develop a new notion of sufficient context, along with a way to classify instances that have enough information to answer the query. We then use sufficient context to analyze several models and datasets. By stratifying errors based on context sufficiency, we find that proprietary LLMs (Gemini, GPT, Claude) excel at answering queries when the context is sufficient, but often output incorrect answers instead of abstaining when the context is not. We further categorize cases when the context is useful, and improves accuracy, even though it does not fully answer the query and the model errs without the context. Building on our findings, we explore ways to reduce hallucinations in RAG systems, including a new selective generation method that leverages sufficient context information for guided abstention. Our method improves the fraction of correct answers among times where the model responds by 2-10% for Gemini, GPT, and Gemma. Providing Large Language Models (LLMs) with additional context, such as in Retrieval Augmented Generation (RAG) systems, has led to major improvements in LLM factuality and verifiability when adapting to new domains (Lewis et al., 2020). In the case of open-domain question answering, a retrieval model provides context at inference time in the form of snippets or long-form text (Zhu et al., 2021). Then, the model synthesizes the query along with this added context to generate the answer. The ideal outcome is for the LLM to output the correct answer if the provided context contains enough information to answer the question when combined with the model's parametric knowledge. Otherwise, the model should abstain from answering and/or ask for more information. One core challenge in achieving this ideal outcome is building models that can use the provided context only when it helps answer the question correctly. Several works have investigated this issue by evaluating models in the presence of irrelevant information in the context (discussed in Section 2). However, "relevant information" can range from directly containing the answer to simply being topically related Work done during an internship at Google. Work done during an internship at Google. Question: Who is Lya L. married to?

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Dec-6-2024

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (1.00)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Leisure & Entertainment > Sports (0.46)
- Transportation > Ground
  - Rail (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)