Goto

Collaborating Authors

 ai code generator


Optimizing AI-Assisted Code Generation

Torka, Simon, Albayrak, Sahin

arXiv.org Artificial Intelligence

In recent years, the rise of AI-assisted code-generation tools has significantly transformed software development. While code generators have mainly been used to support conventional software development, their use will be extended to powerful and secure AI systems. Systems capable of generating code, such as ChatGPT, OpenAI Codex, GitHub Copilot, and AlphaCode, take advantage of advances in machine learning (ML) and natural language processing (NLP) enabled by large language models (LLMs). However, it must be borne in mind that these models work probabilistically, which means that although they can generate complex code from natural language input, there is no guarantee for the functionality and security of the generated code. However, to fully exploit the considerable potential of this technology, the security, reliability, functionality, and quality of the generated code must be guaranteed. This paper examines the implementation of these goals to date and explores strategies to optimize them. In addition, we explore how these systems can be optimized to create safe, high-performance, and executable artificial intelligence (AI) models, and consider how to improve their accessibility to make AI development more inclusive and equitable.


AI Code Generators for Security: Friend or Foe?

Natella, Roberto, Liguori, Pietro, Improta, Cristina, Cukic, Bojan, Cotroneo, Domenico

arXiv.org Artificial Intelligence

Abstract--Recent advances of AI code generators are opening new opportunities in software security research, including misuse by malicious actors. We make the case that cybersecurity professionals need to leverage AI code generators. We review use cases for AI code generators for security, and introduce an evaluation benchmark for these tools. These models can automatically mitigate intrusions. Recent studies analyzed this technology in web and books, using highly scalable deep-learning the context of generating malware, malicious content architectures.


Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

Cotroneo, Domenico, Improta, Cristina, Liguori, Pietro, Natella, Roberto

arXiv.org Artificial Intelligence

AI-based code generators have become pivotal in assisting developers in writing software starting from natural language (NL). However, they are trained on large amounts of data, often collected from unsanitized online sources (e.g., GitHub, HuggingFace). As a consequence, AI models become an easy target for data poisoning, i.e., an attack that injects malicious samples into the training data to generate vulnerable code. To address this threat, we investigate the security of AI code generators by devising a targeted data poisoning strategy. We poison the training data by injecting increasing amounts of code containing security vulnerabilities and assess the attack's success on different state-of-the-art models for code generation. Our study shows that AI code generators are vulnerable to even a small amount of poison. Notably, the attack success strongly depends on the model architecture and poisoning rate, whereas it is not influenced by the type of vulnerabilities. Moreover, since the attack does not impact the correctness of code generated by pre-trained models, it is hard to detect. Lastly, our work offers practical insights into understanding and potentially mitigating this threat.