Goto

Collaborating Authors

 Large Language Model










On the Exploitability of Instruction Tuning

Neural Information Processing Systems

In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior.