Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents

Widyasari, Ratnadira, Weyssow, Martin, Irsan, Ivana Clairine, Ang, Han Wei, Liauw, Frank, Ouh, Eng Lieh, Shar, Lwin Khin, Kang, Hong Jin, Lo, David

Dec-5-2025–arXiv.org Artificial Intelligence

Detecting vulnerabilities in source code remains a critical yet challenging task, especially when benign and vulnerable functions share significant similarities. In this work, we introduce VulTrial, a courtroom-inspired multi-agent framework designed to identify vulnerable code and to provide explanations. It employs four role-specific agents, which are security researcher, code author, moderator, and review board. Using GPT-4o as the base LLM, VulTrial almost doubles the efficacy of prior best-performing baselines. Additionally, we show that role-specific instruction tuning with small quantities of data significantly further boosts VulTrial's efficacy. Our extensive experiments demonstrate the efficacy of VulTrial across different LLMs, including an open-source, in-house-deployable model (LLaMA-3.1-8B), as well as the high quality of its generated explanations and its ability to uncover multiple confirmed zero-day vulnerabilities in the wild.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Dec-5-2025

arXiv.org PDF

Add feedback

Country:
- South America > Brazil > Rio de Janeiro (0.16)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Law (1.00)
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found