Label Poisoning is All You Need
–Neural Information Processing Systems
In a backdoor attack, an adversary injects corrupted data into a model's training dataset in order to gain control over its predictions on images with a specific attacker-defined trigger. A typical corrupted training example requires altering both the image, by applying the trigger, and the label. Models trained on clean images, therefore, were considered safe from backdoor attacks. However, in some common machine learning scenarios, the training labels are provided by potentially malicious third-parties. This includes crowd-sourced annotation and knowledge distillation. We, hence, investigate a fundamental question: can we launch a successful backdoor attack by only corrupting labels?
Neural Information Processing Systems
Feb-17-2026, 14:16:01 GMT
- Country:
- Asia
- Middle East > Israel (0.04)
- Nepal (0.04)
- Europe
- Austria (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- North America > United States
- Washington > King County > Seattle (0.04)
- South America > Chile
- Asia
- Genre:
- Research Report > New Finding (0.46)
- Workflow (0.68)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: