The current paradigm of evaluating Large Language Models (LLMs) through static benchmarks comes with significant limitations, such as vulnerability to data contamination and a lack of adaptability to the evolving capabilities of LLMs.
Longer contexts present a new attack surface for adversarial attacks. In search of a "fruit-fly" of long-context vulnerabilities, we study Many-shot Jailbreaking (MSJ; Figure 1), a simple yet effective and scalable jailbreak.
The integration of synthetically generated data in the second and third steps enhances the models' capabilities in interpreting and processing legal texts, effectively reaching state-of-the-art performance and outperforming