Many-shot Jailbreaking

Neural Information Processing Systems 

Longer contexts present a new attack surface for adversarial attacks. In search of a "fruit-fly" of long-context vulnerabilities, we study Many-shot Jailbreaking (MSJ; Figure 1), a simple yet effective and scalable jailbreak.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found