AITopics

Genre: Research Report > Promising Solution (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsMay-27-2025, 13:54:34 GMT

A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions

The field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change. We demonstrate AutoPerf's generality and efficacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs.

diagnosing software performance regression, regression testing, zero-positive learning approach, (3 more...)

Genre: Research Report > Promising Solution (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Morishige, Masumi, Koshihara, Ryo

Ensuring Reproducibility in Generative AI Systems for General Use Cases: A Framework for Regression Testing and Open Datasets

arXiv.org Artificial IntelligenceMay-7-2025

Reproducibility and reliability remain pressing challenges for generative AI systems whose behavior can drift with each model update or prompt revision. We introduce GPR-bench, a lightweight, extensible benchmark that operationalizes regression testing for general purpose use cases. GPR-bench couples an open, bilingual (English and Japanese) dataset covering eight task categories (e.g., text generation, code generation, and information retrieval) and 10 scenarios in each task categories (80 total test cases for each language) with an automated evaluation pipeline that employs "LLM-as-a-Judge" scoring of correctness and conciseness. Experiments across three recent model versions - gpt-4o-mini, o3-mini, and o4-mini - and two prompt configurations (default versus concise-writing instruction) reveal heterogeneous quality. Our results show that newer models generally improve correctness, but the differences are modest and not statistically significant, suggesting that GPR-bench may not be sufficiently challenging to differentiate between recent model versions. In contrast, the concise-writing instruction significantly enhances conciseness (+12.37 pp, Mann-Whitney U test: p < 0.001, effect size r = 0.2995) with minimal degradations on accuracy (-1.7 pp), demonstrating the effectiveness of prompt engineering. Released under the MIT License, GPR- bench lowers the barrier to initiating reproducibility monitoring and provides a foundation for community-driven extensions, while also raising important considerations about benchmark design for rapidly evolving language models.

large language model, machine learning, natural language, (18 more...)

2505.02854

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.87)

Neural Information Processing SystemsJan-26-2025, 01:21:59 GMT

A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions

The field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change. We demonstrate AutoPerf's generality and efficacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs.

diagnosing software performance regression, regression testing, zero-positive learning approach, (3 more...)

Genre: Research Report > Promising Solution (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Karatayev, Aron, Ogorodova, Anna, Shamoi, Pakizar

Fuzzy Inference System for Test Case Prioritization in Software Testing

arXiv.org Artificial IntelligenceApr-25-2024

In the realm of software development, testing is crucial for ensuring software quality and adherence to requirements. However, it can be time-consuming and resource-intensive, especially when dealing with large and complex software systems. Test case prioritization (TCP) is a vital strategy to enhance testing efficiency by identifying the most critical test cases for early execution. This paper introduces a novel fuzzy logic-based approach to automate TCP, using fuzzy linguistic variables and expert-derived fuzzy rules to establish a link between test case characteristics and their prioritization. Our methodology utilizes two fuzzy variables - failure rate and execution time - alongside two crisp parameters: Prerequisite Test Case and Recently Updated Flag. Our findings demonstrate the proposed system capacity to rank test cases effectively through experimental validation on a real-world software system. The results affirm the practical applicability of our approach in optimizing the TCP and reducing the resource intensity of software testing.

case prioritization, prioritization, test case, (11 more...)

2404.16395

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Kazakhstan > Almaty Region > Almaty (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)

Ma, Wanqin, Yang, Chenyang, Kästner, Christian

(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs

arXiv.org Artificial IntelligenceFeb-6-2024

Large Language Models (LLMs) are increasingly integrated into software applications. Downstream application developers often access LLMs through APIs provided as a service. However, LLM APIs are often updated silently and scheduled to be deprecated, forcing users to continuously adapt to evolving models. This can cause performance regression and affect prompt design choices, as evidenced by our case study on toxicity detection. Based on our case study, we emphasize the need for and re-examine the concept of regression testing for evolving LLM APIs. We argue that regression testing LLMs requires fundamental changes to traditional testing approaches, due to different correctness notions, prompting brittleness, and non-determinism in LLM APIs.

llm api, proceedings, regression, (12 more...)

2311.11123

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > Portugal > Lisbon > Lisbon (0.05)
Europe > Germany > Hamburg (0.04)
(6 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

arXiv.org Artificial IntelligenceNov-30-2022

BotSIM: An End-to-End Bot Simulation Framework for Commercial Task-Oriented Dialog Systems

Wang, Guangsen, Tan, Samson, Joty, Shafiq, Wu, Gang, Au, Jimmy, Hoi, Steven

We present BotSIM, a data-efficient end-to-end Bot SIMulation toolkit for commercial text-based task-oriented dialog (TOD) systems. BotSIM consists of three major components: 1) a Generator that can infer semantic-level dialog acts and entities from bot definitions and generate user queries via model-based paraphrasing; 2) an agenda-based dialog user Simulator (ABUS) to simulate conversations with the dialog agents; 3) a Remediator to analyze the simulated conversations, visualize the bot health reports and provide actionable remediation suggestions for bot troubleshooting and improvement. We demonstrate BotSIM's effectiveness in end-to-end evaluation, remediation and multi-intent dialog generation via case studies on two commercial bot platforms. BotSIM's "generation-simulation-remediation" paradigm accelerates the end-to-end bot evaluation and iteration process by: 1) reducing manual test cases creation efforts; 2) enabling a holistic gauge of the bot in terms of NLU and end-to-end performance via extensive dialog simulation; 3) improving the bot troubleshooting process with actionable suggestions. A demo of our system can be found at https://tinyurl.com/mryu74cd and a demo video at https://youtu.be/qLi5iSoly30. We have open-sourced the toolkit at https://github.com/salesforce/botsim

artificial intelligence, machine learning, natural language, (18 more...)

2211.11982

Country:

North America > United States > New York > Monroe County > Rochester (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > South Korea (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Software (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceSep-30-2022, 18:57:23 GMT

Software testing trends: From AI to DevTestOps, what's hot and why

The software development space is extremely volatile and is constantly evolving. In software testing, what works for an organization in the present may not be as effective a few months down the line. As the workloads become more distributed and decentralized, it is harder to test them and ensure quality. Today, organizations require quality at speed. The time it takes for products to reach the market is getting shorter, and testing can sometimes seem more like a hindrance than a necessity.

devtestop, software testing trend, workload, (14 more...)

#artificialintelligence

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence (1.00)

Alam, Mejbah, Gottschlich, Justin, Tatbul, Nesime, Turek, Javier S., Mattson, Tim, Muzahid, Abdullah

A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions

Neural Information Processing SystemsMar-19-2020, 01:18:07 GMT

The field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change. We demonstrate AutoPerf's generality and efficacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs.

diagnosing software performance regression, regression testing, zero-positive learning approach, (3 more...)

Genre: Research Report > Promising Solution (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceNov-25-2019, 02:33:02 GMT

Regression Testing in Era of Internet of Things and Machine Learning: A practical approach by Abhinandan H Patil Blurb Books

Abhinandan H. Patil is Founder and CTO of Technology Firm in India, Karnataka. Before this, he has worked in Wireless Network Software Organization as Lead Software Engineer for close to a decade. He spent 5 years in Research and the output of the Research is available as Book and Thesis in IJSER, USA. He is Active Researcher in the field of Machine Learning, Deep Learning, Data Science, Artificial Intelligence, Regression Testing applied to Networks, Communication and Internet of Things. He is active contributor of Science, Technology, Engineering and Mathematics.

practical approach, regression testing, thing and machine learning, (2 more...)

#artificialintelligence

Country:

North America > United States (0.32)
Asia > India > Karnataka (0.32)

Industry: Information Technology > Smart Houses & Appliances (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)