A systematic literature review on the code smells datasets and validation mechanisms

Zakeri-Nasrabadi, Morteza, Parsa, Saeed, Esmaili, Ehsan, Palomba, Fabio

Jun-2-2023–arXiv.org Artificial Intelligence

The accuracy reported for code smell-detecting tools varies depending on the dataset used to evaluate the tools. Our survey of 45 existing datasets reveals that the adequacy of a dataset for detecting smells highly depends on relevant properties such as the size, severity level, project types, number of each type of smell, number of smells, and the ratio of smelly to non-smelly samples in the dataset. Most existing datasets support God Class, Long Method, and Feature Envy while six smells in Fowler and Beck's catalog are not supported by any datasets. We conclude that existing datasets suffer from imbalanced samples, lack of supporting severity level, and restriction to Java language.

data mining, machine learning, programming language, (16 more...)

arXiv.org Artificial Intelligence

Jun-2-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Italy (0.04)
- South America > French Guiana
  - Guyane > Cayenne (0.04)
- Asia > Middle East
  - Iran > Tehran Province > Tehran (0.04)

Genre:
- Overview (1.00)
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.68)

Technology:
- Information Technology
  - Software Engineering (1.00)
  - Software > Programming Languages (1.00)
  - Information Management (1.00)
  - Data Science > Data Mining (1.00)
  - Communications (0.93)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language (1.00)
    - Machine Learning
      - Performance Analysis > Accuracy (1.00)
      - Neural Networks (0.92)
      - Statistical Learning (0.92)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found