Backdoor Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment

Open in new window