SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories
Shen, Chihao, Dilgren, Connor, Chiniya, Purva, Griffith, Luke, Ding, Yu, Chen, Yizheng
–arXiv.org Artificial Intelligence
This paper introduces SecRepoBench, a benchmark to evaluate code agents on secure code completion in real-world repositories. SecRepoBench has 318 code completion tasks in 27 C/C++ repositories, covering 15 CWEs. We evaluate 28 standalone LLMs and 13 code agents across 3 state-of-the-art agent frameworks using our benchmark. We find that state-of-the-art LLMs struggle with generating correct and secure code completions. However, code agents significantly outperform standalone LLMs. We show that SecRepoBench is more difficult than the prior state-of-the-art benchmark. Finally, our comprehensive analysis provides insights into potential directions for enhancing the ability of code agents to write correct and secure code in real-world repositories.
arXiv.org Artificial Intelligence
Nov-6-2025
- Country:
- Asia > China
- Hong Kong (0.04)
- Europe > Switzerland
- Basel-City > Basel (0.04)
- North America > United States
- Maryland (0.05)
- New York > New York County
- New York City (0.04)
- Asia > China
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: