Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models

Open in new window