Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
–Neural Information Processing Systems
Recent research indicates that large language models (LLMs) are susceptible to jailbreaking attacks that can generate harmful content. This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), which has been shown to successfully jailbreak multiple open-source LLMs.
Neural Information Processing Systems
Mar-19-2026, 07:58:55 GMT