Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs
–Neural Information Processing Systems
Despite substantial efforts in safety alignment, recent research indicates that Large Language Models (LLMs) remain highly susceptible to jailbreak attacks.
Neural Information Processing Systems
Jun-15-2026, 22:52:35 GMT
- Country:
- North America > United States (0.28)
- Genre:
- Instructional Material (1.00)
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Industry:
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Banking & Finance (1.00)
- Media (0.68)
- Health & Medicine
- Consumer Health (1.00)
- Therapeutic Area > Psychiatry/Psychology
- Mental Health (0.45)
- Education
- Curriculum (0.92)
- Health & Safety > School Nutrition (0.92)
- Technology: