Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation

Open in new window