Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

Open in new window