TempPerturb-Eval: On the Joint Effects of Internal Temperature and External Perturbations in RAG Robustness
Zhou, Yongxin, Mulhem, Philippe, Schwab, Didier
–arXiv.org Artificial Intelligence
The evaluation of Retrieval-Augmented Generation (RAG) systems typically examines retrieval quality and generation parameters like temperature in isolation, overlooking their interaction. This work presents a systematic investigation of how text perturbations (simulating noisy retrieval) interact with temperature settings across multiple LLM runs. We propose a comprehensive RAG Perturbation-Temperature Analysis Framework that subjects retrieved documents to three distinct perturbation types across varying temperature settings. Through extensive experiments on HotpotQA with both open-source and proprietary LLMs, we demonstrate that performance degradation follows distinct patterns: high-temperature settings consistently amplify vulnerability to perturbations, while certain perturbation types exhibit non-linear sensitivity across the temperature range. Our work yields three key contributions: (1) a diagnostic benchmark for assessing RAG robustness, (2) an analytical framework for quantifying perturbation-temperature interactions, and (3) practical guidelines for model selection and parameter tuning under noisy retrieval conditions.
arXiv.org Artificial Intelligence
Dec-2-2025
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia > Singapore (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- France
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Switzerland (0.04)
- Belgium > Brussels-Capital Region
- North America > United States
- Florida > Miami-Dade County
- Miami (0.04)
- Indiana (0.04)
- New York > New York County
- New York City (0.05)
- Ohio (0.04)
- Florida > Miami-Dade County
- Africa > Ethiopia
- Genre:
- Research Report > New Finding (0.46)
- Technology: