Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs

Open in new window