Automatically Categorising GitHub Repositories by Application Domain

Zanartu, Francisco, Treude, Christoph, Cartaxo, Bruno, Borges, Hudson Silva, Moura, Pedro, Wagner, Markus, Pinto, Gustavo

arXiv.org Artificial Intelligence 

For example, there are limited means available to separate repositories containing engineered software projects from other repositories, such as personal projects or those that use GitHub for free cloud storage (Kalliamvakou et al., 2014; Munaiah et al., 2017). To make it easier for users to identify relevant repositories for their wide variety of use cases, GitHub has been adding features to its service, such as README files, topics tags, and showcases (where contributors describe, add keywords, and label their repository). However, these features are insufficient for many use cases. For example, while achieving generalizability of the results is the primary objective of many empirical papers, modern computing research is largely application domain independent (Capiluppi et al., 2020). Application domains are the sections of reality for which a software system is designed. Their importance relies on their serving as the starting point for actual state analysis and usually includes domain-specific language, meaning that developers in this domain think about their project in a specific way, with particular terms and concepts (Züllighoven, 2004). Application domains are not a feature currently implemented by GitHub to catalogue repositories. Previous work has found that repository quality indicators, such as object-oriented metrics, can be "extremely sensitive to application domains" (Capiluppi and Ajienka, 2019), and that the application domain is an important factor in predicting repository popularity (Borges et al., 2016). Furthermore, since documentation of GitHub repositories is often incomplete (Prana et al., 2019), information about the application domain of a repository can be crucial to gain a high-level understanding of its content and purpose.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found