Automatic Detection of LLM-generated Code: A Case Study of Claude 3 Haiku

Rahman, Musfiqur, Khatoonabadi, SayedHassan, Abdellatif, Ahmad, Shihab, Emad

Sep-2-2024–arXiv.org Artificial Intelligence

Using Large Language Models (LLMs) has gained popularity among software developers for generating source code. However, the use of LLM-generated code can introduce risks of adding suboptimal, defective, and vulnerable code. This makes it necessary to devise methods for the accurate detection of LLM-generated code. Toward this goal, we perform a case study of Claude 3 Haiku (or Claude 3 for brevity) on CodeSearchNet dataset. We divide our analyses into two parts: function-level and class-level. We extract 22 software metric features, such as Code Lines and Cyclomatic Complexity, for each level of granularity. We then analyze code snippets generated by Claude 3 and their human-authored counterparts using the extracted features to understand how unique the code generated by Claude 3 is. In the following step, we use the unique characteristics of Claude 3-generated code to build Machine Learning (ML) models and identify which features of the code snippets make them more detectable by ML models. Our results indicate that Claude 3 tends to generate longer functions, but shorter classes than humans, and this characteristic can be used to detect Claude 3-generated code with ML models with 82% and 66% accuracies for function-level and class-level snippets, respectively.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Sep-2-2024

arXiv.org PDF

Add feedback

Country:
- North America > Canada (0.47)

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (1.00)

Industry:
- Information Technology > Software (0.34)
- Law (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.69)
    - Performance Analysis > Accuracy (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found