Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents
Widyasari, Ratnadira, Weyssow, Martin, Irsan, Ivana Clairine, Ang, Han Wei, Liauw, Frank, Ouh, Eng Lieh, Shar, Lwin Khin, Kang, Hong Jin, Lo, David
–arXiv.org Artificial Intelligence
Detecting vulnerabilities in source code remains a critical yet challenging task, especially when benign and vulnerable functions share significant similarities. In this work, we introduce VulTrial, a courtroom-inspired multi-agent framework designed to identify vulnerable code and to provide explanations. It employs four role-specific agents, which are security researcher, code author, moderator, and review board. Using GPT-4o as the base LLM, VulTrial almost doubles the efficacy of prior best-performing baselines. Additionally, we show that role-specific instruction tuning with small quantities of data significantly further boosts VulTrial's efficacy. Our extensive experiments demonstrate the efficacy of VulTrial across different LLMs, including an open-source, in-house-deployable model (LLaMA-3.1-8B), as well as the high quality of its generated explanations and its ability to uncover multiple confirmed zero-day vulnerabilities in the wild.
arXiv.org Artificial Intelligence
Dec-5-2025
- Country:
- Asia > Singapore (0.04)
- Europe > Switzerland
- Basel-City > Basel (0.04)
- North America > United States
- New York > New York County > New York City (0.04)
- Oceania > Australia
- New South Wales > Sydney (0.04)
- South America > Brazil
- Rio de Janeiro > Rio de Janeiro (0.05)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Law (1.00)
- Technology: