Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
–Neural Information Processing Systems
Large Language Models (LLMs) are typically harmless but remain vulnerable to carefully crafted prompts known as "jailbreaks", which can bypass protective measures and induce harmful behavior.
Neural Information Processing Systems
Oct-10-2025, 05:24:00 GMT
- Country:
- North America > United States > Illinois > Champaign County (0.14)
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.93)
- Research Report
- Industry:
- Government (0.93)
- Health & Medicine > Therapeutic Area (0.93)
- Information Technology > Security & Privacy (0.93)
- Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Technology: