Detecting Underperformance: Noise Injection Increases the Accuracy of Sandbagging LLMs
–Neural Information Processing Systems
Capability evaluations play a crucial role in assessing and regulating frontier AI systems. The effectiveness of these evaluations faces a significant challenge: strategic underperformance, or "sandbagging", where models deliberately underperform during evaluation.
Neural Information Processing Systems
Jun-21-2026, 14:02:19 GMT
- Country:
- Europe (0.28)
- Asia (0.28)
- North America > United States (0.28)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Government (1.00)
- Education (0.93)
- Technology: