Design, Implementation and Evaluation of a Novel Programming Language Topic Classification Workflow

Zhang, Michael, Tian, Yuan, Guizani, Mariam

Sep-26-2025–arXiv.org Artificial Intelligence

As software systems grow in scale and complexity, understanding the distribution of programming language topics within source code becomes increasingly important for guiding technical decisions, improving onboarding, and informing tooling and education. This paper presents the design, implementation, and evaluation of a novel programming language topic classification workflow. Our approach combines a multi-label Support Vector Machine (SVM) with a sliding window and voting strategy to enable fine-grained localization of core language concepts such as operator overloading, virtual functions, inheritance, and templates. Trained on the IBM Project CodeNet dataset, our model achieves an average F1 score of 0.90 across topics and 0.75 in code-topic highlight. Our findings contribute empirical insights and a reusable pipeline for researchers and practitioners interested in code analysis and data-driven software engineering.

classification, machine learning, programming language, (19 more...)

arXiv.org Artificial Intelligence

Sep-26-2025

arXiv.org PDF

Add feedback

Country:
- North America > Canada (0.15)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology
  - Software > Programming Languages (1.00)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning
      - Statistical Learning > Support Vector Machines (0.86)
      - Performance Analysis > Accuracy (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found