Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
–Neural Information Processing Systems
Recent research indicates that large language models (LLMs) are susceptible to jailbreaking attacks that can generate harmful content. This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), which has been shown to successfully jailbreak multiple opensource LLMs.
Neural Information Processing Systems
May-28-2025, 21:09:22 GMT
- Country:
- Asia (0.14)
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (1.00)
- Research Report
- Industry:
- Health & Medicine (0.93)
- Information Technology > Security & Privacy (1.00)
- Technology: