TempPerturb-Eval: On the Joint Effects of Internal Temperature and External Perturbations in RAG Robustness
Zhou, Yongxin, Mulhem, Philippe, Schwab, Didier
–arXiv.org Artificial Intelligence
The evaluation of Retrieval-Augmented Generation (RAG) systems typically examines retrieval quality and generation parameters like temperature in isolation, overlooking their interaction. This work presents a systematic investigation of how text perturbations (simulating noisy retrieval) interact with temperature settings across multiple LLM runs. We propose a comprehensive RAG Perturbation-Temperature Analysis Framework that subjects retrieved documents to three distinct perturbation types across varying temperature settings. Through extensive experiments on HotpotQA with both open-source and proprietary LLMs, we demonstrate that performance degradation follows distinct patterns: high-temperature settings consistently amplify vulnerability to perturbations, while certain perturbation types exhibit non-linear sensitivity across the temperature range. Our work yields three key contributions: (1) a diagnostic benchmark for assessing RAG robustness, (2) an analytical framework for quantifying perturbation-temperature interactions, and (3) practical guidelines for model selection and parameter tuning under noisy retrieval conditions.
arXiv.org Artificial Intelligence
Dec-2-2025
- Country:
- Europe (1.00)
- North America > United States (0.94)
- Genre:
- Research Report > New Finding (0.46)
- Technology: