Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models

Li, Jiatao, Hu, Xinyu, Yin, Xunjian, Wan, Xiaojun

Dec-14-2024–arXiv.org Artificial Intelligence

The integration of documents generated by LLMs themselves (Self-Docs) alongside retrieved documents has emerged as a promising strategy for retrieval-augmented generation systems. However, previous research primarily focuses on optimizing the use of Self-Docs, with their inherent properties remaining underexplored. To bridge this gap, we first investigate the overall effectiveness of Self-Docs, identifying key factors that shape their contribution to RAG performance (RQ1). Building on these insights, we develop a taxonomy grounded in Systemic Functional Linguistics to compare the influence of various Self-Docs categories (RQ2) and explore strategies for combining them with external sources (RQ3). Our findings reveal which types of Self-Docs are most beneficial and offer practical guidelines for leveraging them to achieve significant improvements in knowledge-intensive question answering tasks.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Dec-14-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Kentucky > Logan County (0.04)
    - Arkansas > Drew County (0.04)
    - Illinois > Cook County
      - Chicago (0.04)
    - Connecticut > Fairfield County
      - Bridgeport (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - Eastern Europe (0.04)
  - Ukraine (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
  - Poland > Masovia Province
    - Warsaw (0.04)
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - Russia (1.00)
  - China (0.04)
  - Singapore (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Myanmar > Mandalay Region
    - Mandalay (0.04)
  - Indonesia > Java
    - Jakarta > Jakarta (0.04)

Genre:
- Research Report > New Finding (0.87)

Industry:
- Transportation > Air (1.00)
- Leisure & Entertainment (1.00)
- Aerospace & Defense > Aircraft (1.00)
- Media
  - Film (1.00)
  - Music (0.68)
- Government
  - Foreign Policy (0.68)
  - Regional Government
    - Europe Government > Russia Government (1.00)
    - Asia Government > Russia Government (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.84)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found