LLMs Encode Harmfulness and Refusal Separately

Open in new window