generate test case
Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education
Kumar, Nischal Ashok, Lan, Andrew
In computer science education, test cases are an integral part of programming assignments since they can be used as assessment items to test students' programming knowledge and provide personalized feedback on student-written code. The goal of our work is to propose a fully automated approach for test case generation that can accurately measure student knowledge, which is important for two reasons. First, manually constructing test cases requires expert knowledge and is a labor-intensive process. Second, developing test cases for students, especially those who are novice programmers, is significantly different from those oriented toward professional-level software developers. Therefore, we need an automated process for test case generation to assess student knowledge and provide feedback. In this work, we propose a large language model-based approach to automatically generate test cases and show that they are good measures of student knowledge, using a publicly available dataset that contains student-written Java code. We also discuss future research directions centered on using test cases to help students.
Using LLM to select the right SQL Query from candidates
Text-to-SQL models can generate a list of candidate SQL queries, and the best query is often in the candidate list, but not at the top of the list. An effective re-rank method can select the right SQL query from the candidate list and improve the model's performance. Previous studies on code generation automatically generate test cases and use them to re-rank candidate codes. However, automatic test case generation for text-to-SQL is an understudied field. We propose an automatic test case generation method that first generates a database and then uses LLMs to predict the ground truth, which is the expected execution results of the ground truth SQL query on this database. To reduce the difficulty for LLMs to predict, we conduct experiments to search for ways to generate easy databases for LLMs and design easy-to-understand prompts. Based on our test case generation method, we propose a re-rank method to select the right SQL query from the candidate list. Given a candidate list, our method can generate test cases and re-rank the candidate list according to their pass numbers on these test cases and their generation probabilities. The experiment results on the validation dataset of Spider show that the performance of some state-of-the-art models can get a 3.6\% improvement after applying our re-rank method.
A Case Study on Test Case Construction with Large Language Models: Unveiling Practical Insights and Challenges
Junior, Roberto Francisco de Lima, Presta, Luiz Fernando Paes de Barros, Borborema, Lucca Santos, da Silva, Vanderson Nogueira, Dahia, Marcio Leal de Melo, Santos, Anderson Carlos Sousa e
This paper presents a detailed case study examining the application of Large Language Models (LLMs) in the construction of test cases within the context of software engineering. LLMs, characterized by their advanced natural language processing capabilities, are increasingly garnering attention as tools to automate and enhance various aspects of the software development life cycle. Leveraging a case study methodology, we systematically explore the integration of LLMs in the test case construction process, aiming to shed light on their practical efficacy, challenges encountered, and implications for software quality assurance. The study encompasses the selection of a representative software application, the formulation of test case construction methodologies employing LLMs, and the subsequent evaluation of outcomes. Through a blend of qualitative and quantitative analyses, this study assesses the impact of LLMs on test case comprehensiveness, accuracy, and efficiency. Additionally, delves into challenges such as model interpretability and adaptation to diverse software contexts. The findings from this case study contributes with nuanced insights into the practical utility of LLMs in the domain of test case construction, elucidating their potential benefits and limitations. By addressing real-world scenarios and complexities, this research aims to inform software practitioners and researchers alike about the tangible implications of incorporating LLMs into the software testing landscape, fostering a more comprehensive understanding of their role in optimizing the software development process.
Test Case Generation and Test Oracle Support for Testing CPSs using Hybrid Models
Sadri-Moshkenani, Zahra, Bradley, Justin, Rothermel, Gregg
Cyber-Physical Systems (CPSs) play a central role in the behavior of a wide range of autonomous physical systems such as medical devices, autonomous vehicles, and smart homes, many of which are safety-critical. CPSs are often specified iteratively as a sequence of models at different levels that can be tested via simulation systems at early stages of their development cycle. One such model is a hybrid automaton; these are used frequently for CPS applications and have the advantage of encapsulating both continuous and discrete CPS behaviors. When testing CPSs, engineers can take advantage of these models to generate test cases that target both types of these behaviors. Moreover, since these models are constructed early in the development process for CPSs, they allow test cases to be generated early in that process for those CPSs, even before simulation models of the CPSs have been designed. One challenge when testing CPSs is that these systems may operate differently even under an identically applied test scenario. In such cases, we cannot employ test oracles that use predetermined deterministic behaviors; instead, test oracles should consider sets of desired behaviors in order to determine whether the CPS has behaved appropriately. In this paper we present a test case generation technique, HYTEST, that generates test cases based on hybrid models, accompanied by appropriate test oracles, for use in testing CPSs early in their development cycle. To evaluate the effectiveness and efficiency of HYTEST, we conducted an empirical study in which we applied the technique to several CPSs and measured its ability to detect faults in those CPSs and the amount of time required to perform the testing process. The results of the study show that HYTEST was able to detect faults more effectively and efficiently than the baseline techniques we compare it to.
Top 8 research papers by DeepMind in 2022 (till date)
DeepMind's researchers are working round the clock to push the frontiers of AI. The lab has published 34 research papers in the last four months. Let's look at the key papers the Alphabet subsidiary has published in 2022. The paper found the model size and the training dataset size should be scaled in equal measure for compute-optimal training. The researchers tested the theory by training a compute-optimal model, Chinchilla, using the same compute budget as Gopher but with 70B parameters and 4x more data.
Detecting Operational Adversarial Examples for Reliable Deep Learning
Zhao, Xingyu, Huang, Wei, Schewe, Sven, Dong, Yi, Huang, Xiaowei
The utilisation of Deep Learning (DL) raises new challenges regarding its dependability in critical applications. Sound verification and validation methods are needed to assure the safe and reliable use of DL. However, state-of-the-art debug testing methods on DL that aim at detecting adversarial examples (AEs) ignore the operational profile, which statistically depicts the software's future operational use. This may lead to very modest effectiveness on improving the software's delivered reliability, as the testing budget is likely to be wasted on detecting AEs that are unrealistic or encountered very rarely in real-life operation. In this paper, we first present the novel notion of "operational AEs" which are AEs that have relatively high chance to be seen in future operation. Then an initial design of a new DL testing method to efficiently detect "operational AEs" is provided, as well as some insights on our prospective research plan.