Representation Noising: A Defence Mechanism Against Harmful Finetuning
–Neural Information Processing Systems
Releasing open-source large language models (LLMs) presents a dual-use risk since bad actors can easily fine-tune these models for harmful purposes. Even without the open release of weights, weight stealing and fine-tuning APIs make closed models vulnerable to harmful fine-tuning attacks (HFAs).
Neural Information Processing Systems
Oct-9-2025, 19:28:09 GMT
- Country:
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- North America
- United States > Massachusetts
- Middlesex County > Cambridge (0.04)
- Canada
- Ontario > Toronto (0.14)
- Nova Scotia (0.04)
- United States > Massachusetts
- Europe
- Latvia > Lubāna Municipality
- Lubāna (0.04)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Latvia > Lubāna Municipality
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- South America > Colombia
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: