Representation Noising: A Defence Mechanism Against Harmful Finetuning

Feb-8-2026, 12:47:11 GMT–Neural Information Processing Systems

Releasing open-source large language models (LLMs) presents a dual-use risk since bad actors can easily fine-tune these models for harmful purposes. Even without the open release of weights, weight stealing and fine-tuning APIs make closed models vulnerable to harmful fine-tuning attacks (HFAs).

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Feb-8-2026, 12:47:11 GMT

Conferences PDF

Add feedback

Country:
- South America > Colombia
  - Meta Department > Villavicencio (0.04)
- North America
  - United States > Massachusetts
    - Middlesex County > Cambridge (0.04)
  - Canada
    - Ontario > Toronto (0.14)
    - Nova Scotia (0.04)
- Europe
  - Latvia > Lubāna Municipality
    - Lubāna (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.94)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
Representation Noising: A Defence Mechanism Against Harmful Finetuning

Similar Docs Excel Report more

Title	Similarity	Source
None found