Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

Neural Information Processing Systems 

Recent research indicates that large language models (LLMs) are susceptible to jailbreaking attacks that can generate harmful content. This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), which has been shown to successfully jailbreak multiple opensource LLMs.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found