Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs

Open in new window