Gradient Co-occurrence Analysis for Detecting Unsafe Prompts in Large Language Models

Open in new window