Reviews: Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks

Neural Information Processing Systems 

The main contribution of this paper is a manually curated dataset of functions determining if a function is vulnerable or benign. The novelty here is that there is no bias introduced by either assuming that most of the data is correct (assumed by anomaly detection works like e.g. The evaluation results on this datasets, however, are not convincing for practical application of the resulting classifier. The training data has similar number of vulnerable and benign graphs, while practical programs have much lower percentage of vulnerable functions than the accuracy of the classifier. Thus, accuracy in the 70-80% range is not practical and likely its output in practice will look like pure noise (if 2 out of 100 functions are vulnerable, a classifier with 70% accuracy will give on average 28-29 false positives and has non-trivial chance to miss a vulnerability). This means that the classifier needs significant changes.