GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis

Open in new window