Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters

Oct-10-2025, 05:24:00 GMT–Neural Information Processing Systems

Large Language Models (LLMs) are typically harmless but remain vulnerable to carefully crafted prompts known as "jailbreaks", which can bypass protective measures and induce harmful behavior.

arxiv preprint arxiv, cipher character, moderation guardrail, (12 more...)

Neural Information Processing Systems

Oct-10-2025, 05:24:00 GMT

Conferences PDF

Country:
- North America > United States > Illinois > Champaign County
  - Champaign (0.04)
  - Urbana (0.04)

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (0.93)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Health & Medicine > Therapeutic Area (0.93)
- Government (0.93)
- Information Technology > Security & Privacy (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
6d56bc83ae9a4fafdce050bb36f04174-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found