AITopics | software testing

Collaborating Authors

software testing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Agentic RAG for Software Testing with Hybrid Vector-Graph and Multi-Agent Orchestration

Hariharan, Mohanakrishnan, Arvapalli, Satish, Barma, Seshu, Sheela, Evangeline

arXiv.org Artificial IntelligenceOct-14-2025

-- W e present a n approach to software testing automation using Agentic Retrieval - Augmented Generation (RAG) systems for Quality Engineering (QE) artifact creation. We combine autonomous AI agents with hybrid vector - graph knowledge systems to automate test plan, case, and Q E metric generation. The system achieves remarkable accuracy improvements from 65% to 94.8% while ensuring comprehensive document traceability throughout the quality engineering lifecycle. Experimental validat ion of enterprise Corporate Systems Engineering and SAP migration projects demonstrates an 85% reduction in testing timeline, a n 85% improvement in test suite efficiency, and projected 35% cost savings, resulting in a 2 - month acceleration of go - live . Index Terms -- agentic systems, retrieval - augmented generation, software testing, quality engineering, multi - agent orchestration, hybrid vector - graph, test automation, SAP testing, en terprise systems These limitations become particularly pronounced in enterprise software testing, where maintaining traceability between requirements, test cases, and business logic is paramount for regulatory compliance and quality assurance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.10824

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Law (0.49)
Government (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Add feedback

Large Language Models for Software Testing: A Research Roadmap

Augusto, Cristian, Bertolino, Antonia, De Angelis, Guglielmo, Lonetti, Francesca, Morán, Jesús

arXiv.org Artificial IntelligenceSep-30-2025

Large Language Models (LLMs) are starting to be profiled as one of the most significant disruptions in the Software Testing field. Specifically, they have been successfully applied in software testing tasks such as generating test code, or summarizing documentation. This potential has attracted hundreds of researchers, resulting in dozens of new contributions every month, hardening researchers to stay at the forefront of the wave. Still, to the best of our knowledge, no prior work has provided a structured vision of the progress and most relevant research trends in LLM-based testing. In this article, we aim to provide a roadmap that illustrates its current state, grouping the contributions into different categories, and also sketching the most promising and active research directions for the field. To achieve this objective, we have conducted a semi-systematic literature review, collecting articles and mapping them into the most prominent categories, reviewing the current and ongoing status, and analyzing the open challenges of LLM-based software testing. Lastly, we have outlined several expected long-term impacts of LLMs over the whole software testing field.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.25043

Country:

Oceania > Australia > Victoria > Melbourne (0.28)
North America > United States > California > Sacramento County > Sacramento (0.14)
Europe > Austria > Vienna (0.14)
(51 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Navigating the growing field of research on AI for software testing -- the taxonomy for AI-augmented software testing and an ontology-driven literature survey

Schieferdecker, Ina K.

arXiv.org Artificial IntelligenceSep-3-2025

In industry, software testing is the primary method to verify and validate the functionality, performance, security, usability, and so on, of software-based systems. Test automation has gained increasing attention in industry over the last decade, following decades of intense research into test automation and model-based testing. However, designing, developing, maintaining and evolving test automation is a considerable effort. Meanwhile, AI's breakthroughs in many engineering fields are opening up new perspectives for software testing, for both manual and automated testing. This paper reviews recent research on AI augmentation in software test automation, from no automation to full automation. It also discusses new forms of testing made possible by AI. Based on this, the newly developed taxonomy, ai4st, is presented and used to classify recent research and identify open research questions.

machine learning, natural language, software testing, (18 more...)

arXiv.org Artificial Intelligence

2506.1464

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
South America > Brazil > Bahia > Salvador (0.04)
(25 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.54)

Industry: Education > Curriculum (0.46)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Breaking Barriers in Software Testing: The Power of AI-Driven Automation

Naqvi, Saba, Baqar, Mohammad

arXiv.org Artificial IntelligenceAug-25-2025

Software testing remains critical for ensuring reliability, yet traditional approaches are slow, costly, and prone to gaps in coverage. This paper presents an AI-driven framework that automates test case generation and validation using natural language processing (NLP), reinforcement learning (RL), and predictive models, embedded within a policy-driven trust and fairness model. The approach translates natural language requirements into executable tests, continuously optimizes them through learning, and validates outcomes with real-time analysis while mitigating bias. Case studies demonstrate measurable gains in defect detection, reduced testing effort, and faster release cycles, showing that AI-enhanced testing improves both efficiency and reliability. By addressing integration and scalability challenges, the framework illustrates how AI can shift testing from a reactive, manual process to a proactive, adaptive system that strengthens software quality in increasingly complex environments.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2508.16025

Genre: Research Report > Experimental Study (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

On the Need for a Statistical Foundation in Scenario-Based Testing of Autonomous Vehicles

Zhao, Xingyu, Aghazadeh-Chakherlou, Robab, Cheng, Chih-Hong, Popov, Peter, Strigini, Lorenzo

arXiv.org Artificial IntelligenceJul-17-2025

Scenario-based testing has emerged as a common method for autonomous vehicles (AVs) safety assessment, offering a more efficient alternative to mile-based testing by focusing on high-risk scenarios. However, fundamental questions persist regarding its stopping rules, residual risk estimation, debug effectiveness, and the impact of simulation fidelity on safety claims. This paper argues that a rigorous statistical foundation is essential to address these challenges and enable rigorous safety assurance. By drawing parallels between AV testing and established software testing methods, we identify shared research gaps and reusable solutions. We propose proof-of-concept models to quantify the probability of failure per scenario (\textit{pfs}) and evaluate testing effectiveness under varying conditions. Our analysis reveals that neither scenario-based nor mile-based testing universally outperforms the other. Furthermore, we give an example of formal reasoning about alignment of synthetic and real-world testing outcomes, a first step towards supporting statistically defensible simulation-based safety claims.

artificial intelligence, bayesian inference, scenario, (17 more...)

arXiv.org Artificial Intelligence

2505.02274

Country:

Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > United Kingdom > England > West Midlands > Coventry (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Transportation > Ground > Road (0.68)
Automobiles & Trucks (0.68)
Information Technology > Robotics & Automation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

The Impact of Software Testing with Quantum Optimization Meets Machine Learning

Bandarupalli, Gopichand

arXiv.org Artificial IntelligenceJun-4-2025

--Modern software systems' complexity challenges efficient testing, as traditional machine learning (ML) struggles with large test suites. This research presents a hybrid framework integrating Quantum Annealing with ML to optimize test case prioritization in CI/CD pipelines. Leveraging quantum optimization, it achieves a 25% increase in defect detection efficiency and a 30% reduction in test execution time versus classical ML, validated on the Defects4J dataset. A simulated CI/CD environment demonstrates robustness across evolving codebases. Visualizations, including defect heatmaps and performance graphs, enhance interpretability. Software testing is integral to ensuring software quality, accounting for 40-50% of development resources in large-scale systems [1]. The rise of microservices, cloud-native architectures, and continuous integration/continuous deployment (CI/CD) practices has intensified the demand for rapid, reliable testing methods [2].

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2506.0209

Country:

North America > United States > Texas (0.04)
Asia > Middle East > Saudi Arabia (0.04)

Genre: Research Report (1.00)

Industry:

Banking & Finance > Trading (0.68)
Information Technology (0.68)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Harden and Catch for Just-in-Time Assured LLM-Based Software Testing: Open Research Challenges

Harman, Mark, O'Hearn, Peter, Sengupta, Shubho

arXiv.org Artificial IntelligenceMay-15-2025

Despite decades of research and practice in automated software testing, several fundamental concepts remain ill-defined and under-explored, yet offer enormous potential real-world impact. We show that these concepts raise exciting new challenges in the context of Large Language Models for software test generation. More specifically, we formally define and investigate the properties of hardening and catching tests. A hardening test is one that seeks to protect against future regressions, while a catching test is one that catches such a regression or a fault in new functionality introduced by a code change. Hardening tests can be generated at any time and may become catching tests when a future regression is caught. We also define and motivate the Catching 'Just-in-Time' (JiTTest) Challenge, in which tests are generated 'just-in-time' to catch new faults before they land into production. We show that any solution to Catching JiTTest generation can also be repurposed to catch latent faults in legacy code. We enumerate possible outcomes for hardening and catching tests and JiTTests, and discuss open research problems, deployment options, and initial results from our work on automated LLM-based hardening at Meta. This paper was written to accompany the keynote by the authors at the ACM International Conference on the Foundations of Software Engineering (FSE) 2025. Author order is alphabetical. The corresponding author is Mark Harman.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2504.16472

Country:

Europe > United Kingdom > England > Greater London > London (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.05)
(17 more...)

Genre: Overview (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.32)

Add feedback

The Potential of LLMs in Automating Software Testing: From Generation to Reporting

Sherifi, Betim, Slhoub, Khaled, Nembhard, Fitzroy

arXiv.org Artificial IntelligenceDec-30-2024

Having a high quality software is essential in software engineering, which requires robust validation and verification processes during testing activities. Manual testing, while effective, can be time consuming and costly, leading to an increased demand for automated methods. Recent advancements in Large Language Models (LLMs) have significantly influenced software engineering, particularly in areas like requirements analysis, test automation, and debugging. This paper explores an agent-oriented approach to automated software testing, using LLMs to reduce human intervention and enhance testing efficiency. The proposed framework integrates LLMs to generate unit tests, visualize call graphs, and automate test execution and reporting. Evaluations across multiple applications in Python and Java demonstrate the system's high test coverage and efficient operation. This research underscores the potential of LLM-powered agents to streamline software testing workflows while addressing challenges in scalability and accuracy.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.00217

Country:

North America > United States > Florida > Brevard County > Melbourne (0.15)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Design choices made by LLM-based test generators prevent them from finding bugs

Mathews, Noble Saji, Nagappan, Meiyappan

arXiv.org Artificial IntelligenceDec-18-2024

There is an increasing amount of research and commercial tools for automated test case generation using Large Language Models (LLMs). This paper critically examines whether recent LLM-based test generation tools, such as Codium CoverAgent and CoverUp, can effectively find bugs or unintentionally validate faulty code. Considering bugs are only exposed by failing test cases, we explore the question: can these tools truly achieve the intended objectives of software testing when their test oracles are designed to pass? Using real human-written buggy code as input, we evaluate these tools, showing how LLM-generated tests can fail to detect bugs and, more alarmingly, how their design can worsen the situation by validating bugs in the generated test suite and rejecting bug-revealing tests. These findings raise important questions about the validity of the design behind LLM-based test generation tools and their impact on software quality and test suite reliability.

large language model, natural language, test generation, (19 more...)

arXiv.org Artificial Intelligence

2412.14137

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

On the Effectiveness of LLMs for Manual Test Verifications

Peixoto, Myron David Lucena Campos, Baia, Davy de Medeiros, Nascimento, Nathalia, Alencar, Paulo, Fonseca, Baldoino, Ribeiro, Márcio

arXiv.org Artificial IntelligenceSep-18-2024

Background: Manual testing is vital for detecting issues missed by automated tests, but specifying accurate verifications is challenging. Aims: This study aims to explore the use of Large Language Models (LLMs) to produce verifications for manual tests. Method: We conducted two independent and complementary exploratory studies. The first study involved using 2 closed-source and 6 open-source LLMs to generate verifications for manual test steps and evaluate their similarity to original verifications. The second study involved recruiting software testing professionals to assess their perception and agreement with the generated verifications compared to the original ones. Results: The open-source models Mistral-7B and Phi-3-mini-4k demonstrated effectiveness and consistency comparable to closed-source models like Gemini-1.5-flash and GPT-3.5-turbo in generating manual test verifications. However, the agreement level among professional testers was slightly above 40%, indicating both promise and room for improvement. While some LLM-generated verifications were considered better than the originals, there were also concerns about AI hallucinations, where verifications significantly deviated from expectations. Conclusion: We contributed by generating a dataset of 37,040 test verifications using 8 different LLMs. Although the models show potential, the relatively modest 40% agreement level highlights the need for further refinement. Enhancing the accuracy, relevance, and clarity of the generated verifications is crucial to ensure greater reliability in real-world testing scenarios.

arXiv.org Artificial Intelligence

2409.12405

Country:

South America > Brazil > Alagoas > Maceió (0.04)
North America > United States > Pennsylvania (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback