AITopics | retrieval-aware robustness evaluation

Collaborating Authors

retrieval-aware robustness evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RARE: Retrieval-Aware Robustness Evaluation for Retrieval-Augmented Generation Systems

Zeng, Yixiao, Cao, Tianyu, Wang, Danqing, Zhao, Xinran, Qiu, Zimeng, Ziyadi, Morteza, Wu, Tongshuang, Li, Lei

arXiv.org Artificial IntelligenceOct-29-2025

Retrieval-Augmented Generation (RAG) enhances recency and factuality in answers. However, existing evaluations rarely test how well these systems cope with real-world noise, conflicting between internal and external retrieved contexts, or fast-changing facts. We introduce Retrieval-Aware Robustness Evaluation (RARE), a unified framework and large-scale benchmark that jointly stress-tests query and document perturbations over dynamic, time-sensitive corpora. One of the central features of RARE is a knowledge-graph-driven synthesis pipeline (RARE-Get) that automatically extracts single and multi-hop relations from the customized corpus and generates multi-level question sets without manual intervention. Leveraging this pipeline, we construct a dataset (RARE-Set) spanning 527 expert-level time-sensitive finance, economics, and policy documents and 48295 questions whose distribution evolves as the underlying sources change. To quantify resilience, we formalize retrieval-conditioned robustness metrics (RARE-Met) that capture a model's ability to remain correct or recover when queries, documents, or real-world retrieval results are systematically altered. Our findings reveal that RAG systems are unexpectedly sensitive to perturbations. Moreover, they consistently demonstrate lower robustness on multi-hop queries compared to single-hop queries across all domains.

large language model, machine learning, retrieval-aware robustness evaluation, (18 more...)

arXiv.org Artificial Intelligence

2506.00789

Country:

Europe > Luxembourg (0.15)
Asia > China (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Banking & Finance > Real Estate (0.68)
Banking & Finance > Economy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback