MetaCipher: A Time-Persistent and Universal Multi-Agent Framework for Cipher-Based Jailbreak Attacks for LLMs
Chen, Boyuan, Shao, Minghao, Basit, Abdul, Garg, Siddharth, Shafique, Muhammad
–arXiv.org Artificial Intelligence
As large language models (LLMs) grow more capable, they face growing vulnerability to sophisticated jailbreak attacks. While developers invest heavily in alignment finetuning and safety guardrails, researchers continue publishing novel attacks, driving progress through adversarial iteration. This dynamic mirrors a strategic game of continual evolution. However, two major challenges hinder jailbreak development: the high cost of querying top-tier LLMs and the short lifespan of effective attacks due to frequent safety updates. These factors limit cost-efficiency and practical impact of research in jailbreak attacks. To address this, we propose MetaCipher, a low-cost, multi-agent jailbreak framework that generalizes across LLMs with varying safety measures. Using reinforcement learning, MetaCipher is modular and adaptive, supporting extensibility to future strategies. Within as few as 10 queries, MetaCipher achieves state-of-the-art attack success rates on recent malicious prompt benchmarks, outperforming prior jailbreak methods. We conduct a large-scale empirical evaluation across diverse victim models and benchmarks, demonstrating its robustness and adaptability. Warning: This paper contains model outputs that may be offensive or harmful, shown solely to demonstrate jailbreak efficacy.
arXiv.org Artificial Intelligence
Aug-14-2025
- Country:
- Asia
- Middle East
- Saudi Arabia > Eastern Province
- Jubail (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.14)
- Saudi Arabia > Eastern Province
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East
- Europe
- Moldova (0.04)
- Ukraine
- Donetsk Oblast (0.04)
- Luhansk Oblast (0.04)
- North America > United States
- Florida > Miami-Dade County
- Miami (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- New York > Kings County
- New York City (0.04)
- Florida > Miami-Dade County
- Asia
- Genre:
- Research Report (0.64)
- Industry:
- Education (0.92)
- Information Technology > Security & Privacy (1.00)
- Technology: