Representation Noising: A Defence Mechanism Against Harmful Finetuning

Neural Information Processing Systems 

Releasing open-source large language models (LLMs) presents a dual-use risk since bad actors can easily fine-tune these models for harmful purposes. Even without the open release of weights, weight stealing and fine-tuning APIs make closed models vulnerable to harmful fine-tuning attacks (HFAs).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found