SafeInt: Shielding Large Language Models from Jailbreak Attacks via Safety-Aware Representation Intervention

Open in new window