Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters

Neural Information Processing Systems 

Large Language Models (LLMs) are typically harmless but remain vulnerable to carefully crafted prompts known as "jailbreaks", which can bypass protective measures and induce harmful behavior.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found