Breaking Guardrails, Facing Walls: Insights on Adversarial AI for Defenders & Researchers

Bertollo, Giacomo, Bodemir, Naz, Burgess, Jonah

Oct-21-2025–arXiv.org Artificial Intelligence

AI red teaming brings security thinking to LLM applications by probing failure modes such as prompt injection, output manipulation, and sensitive data exfiltration. While automated and curated benchmarks (e.g., JailbreakBench [1], HarmBench [2]) are increasingly used to test models and defenses, comparatively fewer studies analyze community scale behavior in the wild. We study ai_gon3_rogu3 [3], a 10 day competition with 504 registrants and 217 active players, to quantify solve dynamics, tactic stratification, and choke points across 11 challenges. We find sharp skill stratification, higher success for output manipulation than for data extraction, and strong effects of format obfuscation tactics, with multi step defenses remaining robust, among other insights.

arxiv preprint arxiv, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

Oct-21-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)
- Government (0.69)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Artificial Intelligence > Natural Language
    - Large Language Model (0.39)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found