PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming
Deng, Wesley Hanwen, Kim, Sunnie S. Y., Jha, Akshita, Holstein, Ken, Eslami, Motahhare, Wilcox, Lauren, Gatys, Leon A
–arXiv.org Artificial Intelligence
Recent developments in AI governance and safety research have called for red-teaming methods that can effectively surface potential risks posed by AI models. Many of these calls have emphasized how the identities and backgrounds of red-teamers can shape their red-teaming strategies, and thus the kinds of risks they are likely to uncover. While automated red-teaming approaches promise to complement human red-teaming by enabling larger-scale exploration of model behavior, current approaches do not consider the role of identity. As an initial step towards incorporating people's background and identities in automated red-teaming, we develop and evaluate a novel method, PersonaTeaming, that introduces personas in the adversarial prompt generation process to explore a wider spectrum of adversarial strategies. In particular, we first introduce a methodology for mutating prompts based on either "red-teaming expert" personas or "regular AI user" personas. We then develop a dynamic persona-generating algorithm that automatically generates various persona types adaptive to different seed prompts. In addition, we develop a set of new metrics to explicitly measure the "mutation distance" to complement existing diversity measurements of adversarial prompts. Our experiments show promising improvements (up to 144.1%) in the attack success rates of adversarial prompts through persona mutation, while maintaining prompt diversity, compared to RainbowPlus, a state-of-the-art automated red-teaming method. We discuss the strengths and limitations of different persona types and mutation methods, shedding light on future opportunities to explore complementarities between automated and human red-teaming approaches.
arXiv.org Artificial Intelligence
Oct-28-2025
- Country:
- Asia
- Europe > Russia
- Central Federal District > Moscow Oblast > Moscow (0.04)
- North America > United States
- District of Columbia > Washington (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Genre:
- Research Report (1.00)
- Industry:
- Government > Regional Government (0.46)
- Health & Medicine > Consumer Health (0.68)
- Technology:
- Information Technology
- Artificial Intelligence
- Issues > Social & Ethical Issues (0.66)
- Machine Learning (1.00)
- Natural Language (1.00)
- Communications > Social Media (0.68)
- Artificial Intelligence
- Information Technology