AITopics | software repository

Collaborating Authors

software repository

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bug Destiny Prediction in Large Open-Source Software Repositories through Sentiment Analysis and BERT Topic Modeling

Pope, Sophie C., Barovic, Andrew, Moin, Armin

arXiv.org Artificial IntelligenceApr-23-2025

This study explores a novel approach to predicting key bug-related outcomes, including the time to resolution, time to fix, and ultimate status of a bug, using data from the Bugzilla Eclipse Project. Specifically, we leverage features available before a bug is resolved to enhance predictive accuracy. Our methodology incorporates sentiment analysis to derive both an emotionality score and a sentiment classification (positive or negative). Additionally, we integrate the bug's priority level and its topic, extracted using a BERTopic model, as features for a Convolutional Neural Network (CNN) and a Multilayer Perceptron (MLP). Our findings indicate that the combination of BERTopic and sentiment analysis can improve certain model performance metrics. Furthermore, we observe that balancing model inputs enhances practical applicability, albeit at the cost of a significant reduction in accuracy in most cases. To address our primary objectives, predicting time-to-resolution, time-to-fix, and bug destiny, we employ both binary classification and exact time value predictions, allowing for a comparative evaluation of their predictive effectiveness. Results demonstrate that sentiment analysis serves as a valuable predictor of a bug's eventual outcome, particularly in determining whether it will be fixed. However, its utility is less pronounced when classifying bugs into more complex or unconventional outcome categories.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.15972

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

SkillScope: A Tool to Predict Fine-Grained Skills Needed to Solve Issues on GitHub

Carter, Benjamin C., Contreras, Jonathan Rivas, Villegas, Carlos A. Llanes, Acharya, Pawan, Utzerath, Jack, Farner, Adonijah O., Jenkins, Hunter, Johnson, Dylan, Penney, Jacob, Steinmacher, Igor, Gerosa, Marco A., Santos, Fabio

arXiv.org Artificial IntelligenceJan-27-2025

New contributors often struggle to find tasks that they can tackle when onboarding onto a new Open Source Software (OSS) project. One reason for this difficulty is that issue trackers lack explanations about the knowledge or skills needed to complete a given task successfully. These explanations can be complex and time-consuming to produce. Past research has partially addressed this problem by labeling issues with issue types, issue difficulty level, and issue skills. However, current approaches are limited to a small set of labels and lack in-depth details about their semantics, which may not sufficiently help contributors identify suitable issues. To surmount this limitation, this paper explores large language models (LLMs) and Random Forest (RF) to predict the multilevel skills required to solve the open issues. We introduce a novel tool, SkillScope, which retrieves current issues from Java projects hosted on GitHub and predicts the multilevel programming skills required to resolve these issues. In a case study, we demonstrate that SkillScope could predict 217 multilevel skills for tasks with 91% precision, 88% recall, and 89% F-measure on average. Practitioners can use this tool to better delegate or choose tasks to solve in OSS projects.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.15922

Country:

North America > United States > Colorado (0.04)
North America > United States > Arizona (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

XRZoo: A Large-Scale and Versatile Dataset of Extended Reality (XR) Applications

Li, Shuqing, Zhang, Chenran, Gao, Cuiyun, Lyu, Michael R.

arXiv.org Artificial IntelligenceDec-10-2024

The rapid advancement of Extended Reality (XR, encompassing AR, MR, and VR) and spatial computing technologies forms a foundational layer for the emerging Metaverse, enabling innovative applications across healthcare, education, manufacturing, and entertainment. However, research in this area is often limited by the lack of large, representative, and highquality application datasets that can support empirical studies and the development of new approaches benefiting XR software processes. In this paper, we introduce XRZoo, a comprehensive and curated dataset of XR applications designed to bridge this gap. XRZoo contains 12,528 free XR applications, spanning nine app stores, across all XR techniques (i.e., AR, MR, and VR) and use cases, with detailed metadata on key aspects such as application descriptions, application categories, release dates, user review numbers, and hardware specifications, etc. By making XRZoo publicly available, we aim to foster reproducible XR software engineering and security research, enable cross-disciplinary investigations, and also support the development of advanced XR systems by providing examples to developers. Our dataset serves as a valuable resource for researchers and practitioners interested in improving the scalability, usability, and effectiveness of XR applications. XRZoo will be released and actively maintained.

application, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.06759

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(3 more...)

Genre:

Research Report (0.40)
Overview (0.34)

Industry:

Education (0.88)
Information Technology > Security & Privacy (0.88)
Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Security & Privacy (0.88)
(2 more...)

Add feedback

CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code

Weyssow, Martin, Di Sipio, Claudio, Di Ruscio, Davide, Sahraoui, Houari

arXiv.org Artificial IntelligenceDec-19-2023

Motivated by recent work on lifelong learning applications for language models (LMs) of code, we introduce CodeLL, a lifelong learning dataset focused on code changes. Our contribution addresses a notable research gap marked by the absence of a long-term temporal dimension in existing code change datasets, limiting their suitability in lifelong learning scenarios. In contrast, our dataset aims to comprehensively capture code changes across the entire release history of open-source software repositories. In this work, we introduce an initial version of CodeLL, comprising 71 machine-learning-based projects mined from Software Heritage. This dataset enables the extraction and in-depth analysis of code changes spanning 2,483 releases at both the method and API levels. CodeLL enables researchers studying the behaviour of LMs in lifelong fine-tuning settings for learning code changes. Additionally, the dataset can help studying data distribution shifts within software repositories and the evolution of API usages over time.

code change, dataset, repository, (10 more...)

arXiv.org Artificial Intelligence

2312.12492

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Abruzzo > L'Aquila Province > L'Aquila (0.04)

Genre: Instructional Material (1.00)

Industry: Education > Educational Setting > Continuing Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Software Repositories and Machine Learning Research in Cyber Security

Vanamala, Mounika, Bryant, Keith, Caravella, Alex

arXiv.org Artificial IntelligenceNov-1-2023

In today's rapidly evolving technological landscape and advanced software development, the rise in cyber security attacks has become a pressing concern. The integration of robust cyber security defenses has become essential across all phases of software development. It holds particular significance in identifying critical cyber security vulnerabilities at the initial stages of the software development life cycle, notably during the requirement phase. Through the utilization of cyber security repositories like The Common Attack Pattern Enumeration and Classification (CAPEC) from MITRE and the Common Vulnerabilities and Exposures (CVE) databases, attempts have been made to leverage topic modeling and machine learning for the detection of these early-stage vulnerabilities in the software requirements process. Past research themes have returned successful outcomes in attempting to automate vulnerability identification for software developers, employing a mixture of unsupervised machine learning methodologies such as LDA and topic modeling. Looking ahead, in our pursuit to improve automation and establish connections between software requirements and vulnerabilities, our strategy entails adopting a variety of supervised machine learning techniques. This array encompasses Support Vector Machines (SVM), Na\"ive Bayes, random forest, neural networking and eventually transitioning into deep learning for our investigation. In the face of the escalating complexity of cyber security, the question of whether machine learning can enhance the identification of vulnerabilities in diverse software development scenarios is a paramount consideration, offering crucial assistance to software developers in developing secure software.

cyber security, repository and machine learning research, software repository

arXiv.org Artificial Intelligence

2311.00691

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.53)

Add feedback

MaintainoMATE: A GitHub App for Intelligent Automation of Maintenance Activities

Nadeem, Anas, Sarwar, Muhammad Usman, Malik, Muhammad Zubair

arXiv.org Artificial IntelligenceAug-31-2023

Software development projects rely on issue tracking systems at the core of tracking maintenance tasks such as bug reports, and enhancement requests. Incoming issue-reports on these issue tracking systems must be managed in an effective manner. First, they must be labelled and then assigned to a particular developer with relevant expertise. This handling of issue-reports is critical and requires thorough scanning of the text entered in an issue-report making it a labor-intensive task. In this paper, we present a unified framework called MaintainoMATE, which is capable of automatically categorizing the issue-reports in their respective category and further assigning the issue-reports to a developer with relevant expertise. We use the Bidirectional Encoder Representations from Transformers (BERT), as an underlying model for MaintainoMATE to learn the contextual information for automatic issue-report labeling and assignment tasks. We deploy the framework used in this work as a GitHub application. We empirically evaluate our approach on GitHub issue-reports to show its capability of assigning labels to the issue-reports. We were able to achieve an F1-score close to 80\%, which is comparable to existing state-of-the-art results. Similarly, our initial evaluations show that we can assign relevant developers to the issue-reports with an F1 score of 54\%, which is a significant improvement over existing approaches. Our initial findings suggest that MaintainoMATE has the potential of improving software quality and reducing maintenance costs by accurately automating activities involved in the maintenance processes. Our future work would be directed towards improving the issue-assignment module.

developer, maintainomate, repository, (13 more...)

arXiv.org Artificial Intelligence

2308.16464

Country:

North America > United States > North Dakota (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario (0.04)
Asia > Indonesia > Java > East Java > Surabaya (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

A Survey on Machine Learning Techniques for Source Code Analysis

Sharma, Tushar, Kechagia, Maria, Georgiou, Stefanos, Tiwari, Rohit, Vats, Indira, Moazen, Hadi, Sarro, Federica

arXiv.org Artificial IntelligenceSep-13-2022

The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis, such as testing and vulnerability detection. Such a large number of studies hinders the community from understanding the current research landscape. This paper aims to summarize the current knowledge in applied machine learning for source code analysis. We review studies belonging to twelve categories of software engineering tasks and corresponding machine learning techniques, tools, and datasets that have been applied to solve them. To do so, we conducted an extensive literature search and identified 479 primary studies published between 2011 and 2021. We summarize our observations and findings with the help of the identified studies. Our findings suggest that the use of machine learning techniques for source code analysis tasks is consistently increasing. We synthesize commonly used steps and the overall workflow for each task and summarize machine learning techniques employed. We identify a comprehensive list of available datasets and tools useable in this context. Finally, the paper discusses perceived challenges in this area, including the availability of standard datasets, reproducibility and replicability, and hardware resources.

ieee 25th international conference, sigsoft international symposium, software reliability engineering, (14 more...)

arXiv.org Artificial Intelligence

2110.0961

Country:

North America > United States > California > San Francisco County > San Francisco (0.13)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > New York > New York County > New York City (0.04)
(54 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(8 more...)

Add feedback

A Mining Software Repository Extended Cookbook: Lessons learned from a literature review

Barros, Daniel, Horita, Flavio, Wiese, Igor, Silva, Kanan

arXiv.org Artificial IntelligenceOct-8-2021

The main purpose of Mining Software Repositories (MSR) is to discover the latest enhancements and provide an insight into how to make improvements in a software project. In light of it, this paper updates the MSR findings of the original MSR Cookbook, by first conducting a systematic mapping study to elicit and analyze the state-of-the-art, and then proposing an extended version of the Cookbook. This extended Cookbook was built on four high-level themes, which were derived from the analysis of a list of 112 selected studies. Hence, it was used to consolidate the extended Cookbook as a contribution to practice and research in the following areas by: 1) including studies published in all available and relevant publication venues; 2) including and updating recommendations in all four high-level themes, with an increase of 84% in comments in this study when compared with the original MSR Cookbook; 3) summarizing the tools employed for each high-level theme; and 4) providing lessons learned for future studies. Thus, the extended Cookbook examined in this work can support new research projects, as upgraded recommendations and the lessons learned are available with the aid of samples and tools.

cookbook, recommendation, software repository, (11 more...)

arXiv.org Artificial Intelligence

2110.04095

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York > New York County > New York City (0.05)
South America > Brazil > São Paulo (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry: Materials > Metals & Mining (0.69)

Technology:

Information Technology > Software (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(3 more...)

Add feedback

Catching Bugs Without Really Trying

#artificialintelligenceMar-1-2019, 17:31:36 GMT

Finding and fixing bugs is critical to delivering quality software. One-time new bugs are often introduced into a system when changes are uploaded to a software repository. Changes may be due to adding new features or possibly correcting existing bugs. Mozilla is planning on taking advantage of research being done by Ubisoft using artificial intelligence (AI) and machine learning (ML) to automatically catch software bugs when source code is committed to a software repository. The software is called CLEVER for Combining Levels of Bug Prevention and Resolution techniques.

artificial intelligence, clever, machine learning, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

Add feedback