c-agent
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems
Zhang, Andy K., Ji, Joey, Menders, Celeste, Dulepet, Riya, Qin, Thomas, Wang, Ron Y., Wu, Junrong, Liao, Kyleen, Li, Jiliang, Hu, Jinghan, Hong, Sara, Demilew, Nardos, Murgai, Shivatmica, Tran, Jason, Kacheria, Nishka, Ho, Ethan, Liu, Denis, McLane, Lauren, Bruvik, Olivia, Han, Dai-Rong, Kim, Seungwoo, Vyas, Akhil, Chen, Cuiyuanxiu, Li, Ryan, Xu, Weiran, Ye, Jonathan Z., Choudhary, Prerit, Bhatia, Siddharth M., Sivashankar, Vikram, Bao, Yuxuan, Song, Dawn, Boneh, Dan, Ho, Daniel E., Liang, Percy
AI agents have the potential to significantly alter the cybersecurity landscape. Here, we introduce the first framework to capture offensive and defensive cyber-capabilities in evolving real-world systems. Instantiating this framework with BountyBench, we set up 25 systems with complex, real-world codebases. To capture the vulnerability lifecycle, we define three task types: Detect (detecting a new vulnerability), Exploit (exploiting a given vulnerability), and Patch (patching a given vulnerability). For Detect, we construct a new success indicator, which is general across vulnerability types and provides localized evaluation. We manually set up the environment for each system, including installing packages, setting up server(s), and hydrating database(s). We add 40 bug bounties, which are vulnerabilities with monetary awards from \$10 to \$30,485, covering 9 of the OWASP Top 10 Risks. To modulate task difficulty, we devise a new strategy based on information to guide detection, interpolating from identifying a zero day to exploiting a given vulnerability. We evaluate 10 agents: Claude Code, OpenAI Codex CLI with o3-high and o4-mini, and custom agents with o3-high, GPT-4.1, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet Thinking, Qwen3 235B A22B, Llama 4 Maverick, and DeepSeek-R1. Given up to three attempts, the top-performing agents are Codex CLI: o3-high (12.5% on Detect, mapping to \$3,720; 90% on Patch, mapping to \$14,152), Custom Agent: Claude 3.7 Sonnet Thinking (67.5% on Exploit), and Codex CLI: o4-mini (90% on Patch, mapping to \$14,422). Codex CLI: o3-high, Codex CLI: o4-mini, and Claude Code are more capable at defense, achieving higher Patch scores of 90%, 90%, and 87.5%, compared to Exploit scores of 47.5%, 32.5%, and 57.5% respectively; while the custom agents are relatively balanced between offense and defense, achieving Exploit scores of 17.5-67.5% and Patch scores of 25-60%.
- Research Report > New Finding (0.92)
- Research Report > Experimental Study (0.67)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.70)
The reinforcement learning-based multi-agent cooperative approach for the adaptive speed regulation on a metallurgical pickling line
Bogomolova, Anna, Kingsep, Kseniia, Voskresenskii, Boris
We present a holistic data-driven approach to the problem of productivity increase on the example of a metallurgical pickling line. The proposed approach combines mathematical modeling as a base algorithm and a cooperative Multi-Agent Reinforcement Learning (MARL) system implemented such as to enhance the performance by multiple criteria while also meeting safety and reliability requirements and taking into account the unexpected volatility of certain technological processes. We demonstrate how Deep Q-Learning can be applied to a real-life task in a heavy industry, resulting in significant improvement of previously existing automation systems.The problem of input data scarcity is solved by a two-step combination of LSTM and CGAN, which helps to embrace both the tabular representation of the data and its sequential properties. Offline RL training, a necessity in this setting, has become possible through the sophisticated probabilistic kinematic environment.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)