On the Exploitability of Instruction Tuning
–Neural Information Processing Systems
In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior.
Neural Information Processing Systems
Feb-16-2026, 23:01:29 GMT
- Country:
- Asia > Japan
- Shikoku (0.04)
- Europe
- France (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- North America > United States
- Hawaii (0.04)
- Maryland (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Asia > Japan
- Genre:
- Research Report (0.68)
- Industry:
- Technology: