SCALAR: A Part-of-speech Tagger for Identifiers
Newman, Christian D., Scholten, Brandon, Testa, Sophia, Behler, Joshua A. C., Banabilah, Syreen, Collard, Michael L., Decker, Michael J., Mkaouer, Mohamed Wiem, Zampieri, Marcos, AlOmar, Eman Abdullah, Alsuhaibani, Reem, Peruma, Anthony, Maletic, Jonathan I.
–arXiv.org Artificial Intelligence
--The paper presents the Source Code Analysis and Lexical Annotation Runtime (SCALAR), a tool specialized for mapping (annotating) source code identifier names to their corresponding part-of-speech tag sequence (grammar pattern). SCALAR's internal model is trained using scikit-learn's GradientBoostingClassifier in conjunction with a manually-curated oracle of identifier names and their grammar patterns. This specializes the tagger to recognize the unique structure of the natural language used by developers to create all types of identifiers (e.g., function names, variable names etc.). SCALAR's output is compared with a previous version of the tagger, as well as a modern off-the-shelf part-of-speech tagger to show how it improves upon other taggers' output for annotating identifiers. The code is available on Github 1 Index T erms --Program comprehension, identifier naming, part-of-speech tagging, natural language processing, software maintenance, software evolution I. I NTRODUCTION The identifiers developers create represent a significant amount of the information other developers must use to understand related code. Given that identifiers represent, on average, 70% of the characters in a code base [1], and developers spend more time reading code than writing [2], [3], it is important for researchers to better understand of how identifiers convey information, and how they can be improved to increase developer reading efficiency.
arXiv.org Artificial Intelligence
Apr-25-2025
- Country:
- Asia > Middle East
- Saudi Arabia > Riyadh Province > Riyadh (0.04)
- North America > United States
- Delaware > New Castle County
- Newark (0.04)
- District of Columbia > Washington (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Michigan > Genesee County
- Flint (0.14)
- New Jersey > Hudson County
- Hoboken (0.04)
- New York > Monroe County
- Rochester (0.04)
- Ohio
- Portage County > Kent (0.04)
- Summit County
- Wood County > Bowling Green (0.04)
- Delaware > New Castle County
- Asia > Middle East
- Genre:
- Research Report (0.64)
- Technology: