Revisiting Pre-trained Language Models for Vulnerability Detection
Li, Youpeng, Qi, Weiliang, Wang, Xuyu, Yu, Fuxun, Wang, Xinda
–arXiv.org Artificial Intelligence
The rapid advancement of pre-trained language models (PLMs) has demonstrated promising results for various code-related tasks. However, their effectiveness in detecting real-world vulnerabilities remains a critical challenge. While existing empirical studies evaluate PLMs for vulnerability detection (VD), they suffer from data leakage, limited scope, and superficial analysis, hindering the accuracy and comprehensiveness of evaluations. This paper begins by revisiting the common issues in existing research on PLMs for VD through the evaluation pipeline. It then proceeds with an accurate and extensive evaluation of 18 PLMs on high-quality datasets that feature accurate labeling, diverse vulnerability types, and various projects. Specifically, we compare the performance of PLMs under both fine-tuning and prompt engineering, assess their effectiveness and generalizability across various training and testing settings, and analyze their robustness to a series of perturbations. Our findings reveal that PLMs incorporating pre-training tasks designed to capture the syntactic and semantic patterns of code outperform both general-purpose PLMs and those solely pre-trained or fine-tuned on large code corpora. However, these models face notable challenges in real-world scenarios, such as difficulties in detecting vulnerabilities with complex dependencies, handling perturbations introduced by code normalization and abstraction, and identifying semantic-preserving vulnerable code transformations. Also, the truncation caused by the limited context windows of PLMs can lead to a non-negligible number of labeling errors, which is overlooked by previous work. This study underscores the importance of thorough evaluations of model performance in practical scenarios and outlines future directions to help enhance the effectiveness of PLMs for realistic VD applications.
arXiv.org Artificial Intelligence
Nov-25-2025
- Country:
- Asia
- Europe
- Austria (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Greece > Attica
- Athens (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy
- Poland > Kuyavian-Pomeranian Province
- Bydgoszcz (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Switzerland > Basel-City
- Basel (0.04)
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- United States
- California
- Alameda County > Berkeley (0.04)
- Los Angeles County > Long Beach (0.04)
- San Diego County > San Diego (0.04)
- San Francisco County > San Francisco (0.14)
- Florida > Miami-Dade County
- Miami (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New York > New York County
- New York City (0.04)
- Pennsylvania
- Allegheny County > Pittsburgh (0.04)
- Philadelphia County > Philadelphia (0.04)
- Texas
- Dallas County > Richardson (0.04)
- Travis County > Austin (0.04)
- Washington > King County
- Redmond (0.04)
- California
- Canada
- Oceania > Australia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: