Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models

Jun-13-2026, 18:57:13 GMT–Neural Information Processing Systems

Capability evaluations play a crucial role in assessing and regulating frontier AI systems. The effectiveness of these evaluations faces a significant challenge: strategic underperformance, or ``sandbagging'', where models deliberately underperform during evaluation.

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Jun-13-2026, 18:57:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (1.00)