Vaccine: Perturbation-aware Alignment for Large Language Models against Harmful Fine-tuning Attack

Neural Information Processing Systems 

Inspired by our findings, we propose V accine, a perturbation-aware alignment technique to mitigate the security risk of users fine-tuning.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found