Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Neural Information Processing Systems 

As large language models (LLMs) become increasingly prevalent across many realworld applications, understanding and enhancing their robustness to adversarial attacks is of paramount importance. Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations.