Detecting Underperformance: Noise Injection Increases the Accuracy of Sandbagging LLMs

Open in new window