Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation

Open in new window