Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

Mar-19-2026, 07:58:55 GMT–Neural Information Processing Systems

Recent research indicates that large language models (LLMs) are susceptible to jailbreaking attacks that can generate harmful content. This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), which has been shown to successfully jailbreak multiple open-source LLMs.

large language model, natural language, proceedings, (5 more...)

Neural Information Processing Systems

Mar-19-2026, 07:58:55 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report > New Finding (0.41)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)