Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification

Nguyen, Linh, Liu, Chunhua, Lin, Hong Yi, Thongtanunam, Patanamon

Aug-14-2025–arXiv.org Artificial Intelligence

--Code review is a crucial practice in software development. As code review nowadays is lightweight, various issues can be identified, and sometimes, they can be trivial. Research has investigated automated approaches to classify review comments to gauge the effectiveness of code reviews. However, previous studies have primarily relied on supervised machine learning, which requires extensive manual annotation to train the models effectively. T o address this limitation, we explore the potential of using Large Language Models (LLMs) to classify code review comments. We assess the performance of LLMs to classify 17 categories of code review comments. Our results show that LLMs can classify code review comments, outperforming the state-of-the-art approach using a trained deep learning model. In particular, LLMs achieve better accuracy in classifying the five most useful categories, which the state-of-the-art approach struggles with due to low training examples. Rather than relying solely on a specific small training data distribution, our results show that LLMs provide balanced performance across high-and low-frequency categories. These results suggest that the LLMs could offer a scalable solution for code review analytics to improve the effectiveness of the code review process. Index T erms --code review, review comment classification, prompt engineering, large language models. Code Review (CR) is a practice in software development where developers review other developer's code changes asynchronously to find defects and suggest improvements [1]. Acting as a quality assurance gateway, CR has become mandatory in many prominent organizations, with developers reportedly spending 10-15% of their time on this task [2]. In practice, various types of concerns can be raised in CR comments, ranging from code styling to functional issues. As comments often trigger the improvements of code changes, the types of comments play a crucial role in the quality of CR. Constructive and actionable comments addressing quality-improving issues would positively contribute to CR's overall quality and code changes [3]-[5]. On the other hand, trivial or irrelevant comments can waste developers' time without improving the code changes [6].

category, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

Aug-14-2025

arXiv.org PDF

Add feedback

Country:
- North America > Mexico (0.28)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)