Enhancing Robustness of AI Offensive Code Generators via Data Augmentation

Improta, Cristina, Liguori, Pietro, Natella, Roberto, Cukic, Bojan, Cotroneo, Domenico

Oct-1-2023–arXiv.org Artificial Intelligence

In this work, we present a method to add perturbations to the code descriptions to create new inputs in natural language (NL) from well-intentioned developers that diverge from the original ones due to the use of new words or because they miss part of them. The goal is to analyze how and to what extent perturbations affect the performance of AI code generators in the context of security-oriented code. First, we show that perturbed descriptions preserve the semantics of the original, non-perturbed ones. Then, we use the method to assess the robustness of three state-of-the-art code generators against the newly perturbed inputs, showing that the performance of these AI-based solutions is highly affected by perturbations in the NL descriptions. To enhance their robustness, we use the method to perform data augmentation, i.e., to increase the variability and diversity of the NL descriptions in the training data, proving its effectiveness against both perturbed and non-perturbed code descriptions.

code description, nl intent, perturbation, (13 more...)

arXiv.org Artificial Intelligence

Oct-1-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Illinois (0.04)
    - North Carolina > Mecklenburg County
      - Charlotte (0.04)
    - New Mexico > Santa Fe County
      - Santa Fe (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Massachusetts > Suffolk County
      - Boston (0.04)
    - California > San Diego County
      - San Diego (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - Spain > Galicia
    - Madrid (0.04)
  - Italy > Campania
    - Naples (0.04)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Information Technology > Security & Privacy (1.00)
- Education > Educational Setting (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language
    - Machine Translation (0.95)
    - Grammars & Parsing (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found