Can Large Language Models Find And Fix Vulnerable Software?

Aug-20-2023–arXiv.org Artificial Intelligence

In this study, we evaluated the capability of Large Language Models (LLMs), particularly OpenAI's GPT-4, in detecting software vulnerabilities, comparing their performance against traditional static code analyzers like Snyk and Fortify. Our analysis covered numerous repositories, including those from NASA and the Department of Defense. GPT-4 identified approximately four times the vulnerabilities than its counterparts. Furthermore, it provided viable fixes for each vulnerability, demonstrating a low rate of false positives. Our tests encompassed 129 code samples across eight programming languages, revealing the highest vulnerabilities in PHP and JavaScript. GPT-4's code corrections led to a 90% reduction in vulnerabilities, requiring only an 11% increase in code lines. A critical insight was LLMs' ability to self-audit, suggesting fixes for their identified vulnerabilities and underscoring their precision. Future research should explore system-level vulnerabilities and integrate multiple static code analyzers for a holistic perspective on LLMs' potential.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Aug-20-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Alabama > Madison County > Huntsville (0.04)
- Europe
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
  - Germany > Bavaria
    - Regensburg (0.04)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government
  - North America Government > United States Government (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.36)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found