Vul-R2: A Reasoning LLM for Automated Vulnerability Repair

Wen, Xin-Cheng, Lin, Zirui, Yang, Yijun, Gao, Cuiyun, Ye, Deheng

arXiv.org Artificial Intelligence 

Abstract--The exponential increase in software vulnerabilities has created an urgent need for automatic vulnerability repair (A VR) solutions. Recent research has formulated A VR as a sequence generation problem and has leveraged large language models (LLMs) to address this problem. Typically, these approaches prompt or fine-tune LLMs to generate repairs for vulnerabilities directly. Although these methods show state-of-the-art performance, they face the following challenges: (1) Lack of high-quality, vulnerability-related reasoning data. Current approaches primarily rely on foundation models that mainly encode general programming knowledge. Without vulnerability-related reasoning data, they tend to fail to capture the diverse vulnerability repair patterns. Existing reinforcement learning methods often leverage intermediate execution feedback from the environment (e.g., sandbox-based execution results) to guide reinforcement learning training. In contrast, the vulnerability repair process generally lacks such intermediate, verifiable feedback, which poses additional challenges for model training. T o address these challenges, we propose to model the vulnerability repair task from a reasoning perspective and train a reasoning LLM termed Vulnerability Reasoner and Repair (V ul-R2) which consists of two key modules: (1) a domain-aware reasoning learning module, which comprises a reasoning answer construction component, a reasoning data filtering process, and a supervised fine-tuning process for learning vulnerability-related reasoning knowledge; and (2) a curriculum-based verifiable rewarded training module, which comprises dynamically reinforcement learning with verifiable rewards paradigms based on multiple-choice question answering in an easy stage and character-level matching in a hard stage. We evaluate V ul-R2 on the real-world C/C++ dataset PrimeV ul to demonstrate its effectiveness in vulnerability repair . Specifically, V ul-R2 outperforms the best baseline by 11.27% for exact match (EM) and successfully repairs 49 additional vulnerabilities. Furthermore, we demonstrate the effectiveness of the proposed paradigm, fine-tuning V ul-R2 on PrimeV ul leads to improved EM performance of 8.78% on a human curated dataset SVEN, even without additional training.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found