non-functional requirement
The Few-shot Dilemma: Over-prompting Large Language Models
Tang, Yongjian, Tuncel, Doruk, Koerner, Christian, Runkler, Thomas
Abstract--Over-prompting, a phenomenon where excessive examples in prompts lead to diminished performance in Large Language Models (LLMs), challenges the conventional wisdom about in-context few-shot learning. T o investigate this few-shot dilemma, we outline a prompting framework that leverages three standard few-shot selection methods - random sampling, semantic embedding, and TF-IDF vectors - and evaluate these methods across multiple LLMs, including GPT -4o, GPT -3.5-turbo, DeepSeek-V3, Gemma-3, LLaMA-3.1, Our experimental results reveal that incorporating excessive domain-specific examples into prompts can paradoxically degrade performance in certain LLMs, which contradicts the prior empirical conclusion that more relevant few-shot examples universally benefit LLMs. Given the trend of LLM-assisted software engineering and requirement analysis, we experiment with two real-world software requirement classification datasets. By gradually increasing the number of TF-IDF-selected and stratified few-shot examples, we identify their optimal quantity for each LLM. This combined approach achieves superior performance with fewer examples, avoiding the over-prompting problem, thus surpassing the state-of-the-art by 1% in classifying functional and non-functional requirements. Instruction-tuned LLMs have demonstrated exceptional language understanding and knowledge inference capabilities [12].
Leveraging LLMs for User Stories in AI Systems: UStAI Dataset
Yamani, Asma, Baslyman, Malak, Ahmed, Moataz
AI systems are gaining widespread adoption across various sectors and domains. Creating high-quality AI system requirements is crucial for aligning the AI system with business goals and consumer values and for social responsibility. However, with the uncertain nature of AI systems and the heavy reliance on sensitive data, more research is needed to address the elicitation and analysis of AI systems requirements. With the proprietary nature of many AI systems, there is a lack of open-source requirements artifacts and technical requirements documents for AI systems, limiting broader research and investigation. With Large Language Models (LLMs) emerging as a promising alternative to human-generated text, this paper investigates the potential use of LLMs to generate user stories for AI systems based on abstracts from scholarly papers. We conducted an empirical evaluation using three LLMs and generated $1260$ user stories from $42$ abstracts from $26$ domains. We assess their quality using the Quality User Story (QUS) framework. Moreover, we identify relevant non-functional requirements (NFRs) and ethical principles. Our analysis demonstrates that the investigated LLMs can generate user stories inspired by the needs of various stakeholders, offering a promising approach for generating user stories for research purposes and for aiding in the early requirements elicitation phase of AI systems. We have compiled and curated a collection of stories generated by various LLMs into a dataset (UStAI), which is now publicly available for use.
- Asia > Middle East > Saudi Arabia > Eastern Province > Dhahran (0.14)
- Europe > Norway > Central Norway > Trøndelag > Trondheim (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)
LLMs in Coding and their Impact on the Commercial Software Engineering Landscape
Belozerov, Vladislav, Barclay, Peter J, Sami, Askhan
Large-language-model coding tools are now mainstream in software engineering. But as these same tools move human effort up the development stack, they present fresh dangers: 10% of real prompts leak private data, 42% of generated snippets hide security flaws, and the models can even ``agree'' with wrong ideas, a trait called sycophancy. We argue that firms must tag and review every AI-generated line of code, keep prompts and outputs inside private or on-premises deployments, obey emerging safety regulations, and add tests that catch sycophantic answers -- so they can gain speed without losing security and accuracy.
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
Exploring Zero-Shot App Review Classification with ChatGPT: Challenges and Potential
Chaudhary, Mohit, Jain, Chirag, Anish, Preethu Rose
App reviews are a critical source of user feedback, offering valuable insights into an app's performance, features, usability, and overall user experience. Effectively analyzing these reviews is essential for guiding app development, prioritizing feature updates, and enhancing user satisfaction. Classifying reviews into functional and non-functional requirements play a pivotal role in distinguishing feedback related to specific app features (functional requirements) from feedback concerning broader quality attributes, such as performance, usability, and reliability (non-functional requirements). Both categories are integral to informed development decisions. Traditional approaches to classifying app reviews are hindered by the need for large, domain-specific datasets, which are often costly and time-consuming to curate. This study explores the potential of zero-shot learning with ChatGPT for classifying app reviews into four categories: functional requirement, non-functional requirement, both, or neither. We evaluate ChatGPT's performance on a benchmark dataset of 1,880 manually annotated reviews from ten diverse apps spanning multiple domains. Our findings demonstrate that ChatGPT achieves a robust F1 score of 0.842 in review classification, despite certain challenges and limitations. Additionally, we examine how factors such as review readability and length impact classification accuracy and conduct a manual analysis to identify review categories more prone to misclassification.
- Asia > India > Maharashtra > Pune (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study
Almonte, Jomar Thomas, Boominathan, Santhosh Anitha, Nascimento, Nathalia
Neglecting non-functional requirements (NFRs) early in software development can lead to critical challenges. Despite their importance, NFRs are often overlooked or difficult to identify, impacting software quality. To support requirements engineers in eliciting NFRs, we developed a framework that leverages Large Language Models (LLMs) to derive quality-driven NFRs from functional requirements (FRs). Using a custom prompting technique within a Deno-based pipeline, the system identifies relevant quality attributes for each functional requirement and generates corresponding NFRs, aiding systematic integration. A crucial aspect is evaluating the quality and suitability of these generated requirements. Can LLMs produce high-quality NFR suggestions? Using 34 functional requirements - selected as a representative subset of 3,964 FRs-the LLMs inferred applicable attributes based on the ISO/IEC 25010:2023 standard, generating 1,593 NFRs. A horizontal evaluation covered three dimensions: NFR validity, applicability of quality attributes, and classification precision. Ten industry software quality evaluators, averaging 13 years of experience, assessed a subset for relevance and quality. The evaluation showed strong alignment between LLM-generated NFRs and expert assessments, with median validity and applicability scores of 5.0 (means: 4.63 and 4.59, respectively) on a 1-5 scale. In the classification task, 80.4% of LLM-assigned attributes matched expert choices, with 8.3% near misses and 11.3% mismatches. A comparative analysis of eight LLMs highlighted variations in performance, with gemini-1.5-pro exhibiting the highest attribute accuracy, while llama-3.3-70B achieved higher validity and applicability scores. These findings provide insights into the feasibility of using LLMs for automated NFR generation and lay the foundation for further exploration of AI-assisted requirements engineering.
- North America > United States > Pennsylvania (0.05)
- Europe > Germany (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.68)
A Tutorial on the Design, Experimentation and Application of Metaheuristic Algorithms to Real-World Optimization Problems
Osaba, Eneko, Villar-Rodriguez, Esther, Del Ser, Javier, Nebro, Antonio J., Molina, Daniel, LaTorre, Antonio, Suganthan, Ponnuthurai N., Coello, Carlos A. Coello, Herrera, Francisco
In the last few years, the formulation of real-world optimization problems and their efficient solution via metaheuristic algorithms has been a catalyst for a myriad of research studies. In spite of decades of historical advancements on the design and use of metaheuristics, large difficulties still remain in regards to the understandability, algorithmic design uprightness, and performance verifiability of new technical achievements. A clear example stems from the scarce replicability of works dealing with metaheuristics used for optimization, which is often infeasible due to ambiguity and lack of detail in the presentation of the methods to be reproduced. Additionally, in many cases, there is a questionable statistical significance of their reported results. This work aims at providing the audience with a proposal of good practices which should be embraced when conducting studies about metaheuristics methods used for optimization in order to provide scientific rigor, value and transparency. To this end, we introduce a step by step methodology covering every research phase that should be followed when addressing this scientific field. Specifically, frequently overlooked yet crucial aspects and useful recommendations will be discussed in regards to the formulation of the problem, solution encoding, implementation of search operators, evaluation metrics, design of experiments, and considerations for real-world performance, among others. Finally, we will outline important considerations, challenges, and research directions for the success of newly developed optimization metaheuristics in their deployment and operation over real-world application environments.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Michigan (0.04)
- North America > Mexico (0.04)
- (13 more...)
- Workflow (1.00)
- Research Report > Experimental Study (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
- Transportation (1.00)
- Education (0.82)
- Health & Medicine (0.67)
- Materials > Chemicals (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
- (2 more...)
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness
Singhal, Manav, Aggarwal, Tushar, Awasthi, Abhijeet, Natarajan, Nagarajan, Kanade, Aditya
Existing evaluation benchmarks of language models of code (code LMs) focus almost exclusively on whether the LMs can generate functionally-correct code. In real-world software engineering, developers think beyond functional correctness. They have requirements on "how" a functionality should be implemented to meet overall system design objectives like efficiency, security, and maintainability. They would also trust the code LMs more if the LMs demonstrate robust understanding of requirements and code semantics. We propose a new benchmark NoFunEval to evaluate code LMs on non-functional requirements and simple classification instances for both functional and non-functional requirements. We propose a prompting method, Coding Concepts (CoCo), as a way for a developer to communicate the domain knowledge to the LMs. We conduct an extensive evaluation of twenty-two code LMs. Our finding is that they generally falter when tested on our benchmark, hinting at fundamental blindspots in their training setups. Surprisingly, even the classification accuracy on functional-correctness instances derived from the popular HumanEval benchmark is low, calling in question the depth of their comprehension and the source of their success in generating functionally-correct code in the first place. We will release our benchmark and evaluation scripts publicly at https://aka.ms/NoFunEval.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
- Information Technology > Security & Privacy (1.00)
- Law (0.98)
- Government (0.97)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Communications (0.97)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (0.49)
NFRsTDO v1.2's Terms, Properties, and Relationships -- A Top-Domain Non-Functional Requirements Ontology
Olsina, Luis, Papa, María Fernanda, Becker, Pablo
This preprint specifies and defines all the Terms, Properties, and Relationships of NFRsTDO (Non-Functional Requirements Top-Domain Ontology). NFRsTDO v1.2, whose UML conceptualization is shown in Figure 1 is a slightly updated version of its predecessor, namely NFRsTDO v1.1. NFRsTDO is an ontology mainly devoted to quality (non-functional) requirements and quality/cost views, which is placed at the top-domain level in the context of a multilayer ontological architecture called FCD-OntoArch (Foundational, Core, Domain, and instance Ontological Architecture for sciences). Figure 2 depicts its five tiers, which entail Foundational, Core, Top-Domain, Low-Domain, and Instance. Each level is populated with ontological components or, in other words, ontologies. Ontologies at the same level can be related to each other, except at the foundational level, where only ThingFO (Thing Foundational Ontology) is found. In addition, ontologies' terms and relationships at lower levels can be semantically enriched by ontologies' terms and relationships from the higher levels. NFRsTDO's terms and relationships are mainly extended/reused from ThingFO, SituationCO (Situation Core Ontology), ProcessCO (Process Core Ontology), and FRsTDO (Functional Requirements Top-Domain Ontology). Stereotypes are the used mechanism for enriching NFRsTDO terms. Note that annotations of updates from the previous version (NFRsTDO v1.1) to the current one (v1.2) can be found in Appendix A.
- South America > Argentina (0.04)
- Europe > Portugal (0.04)
Awareness requirement and performance management for adaptive systems: a survey
Rashid, Tarik A., Hassan, Bryar A., Alsadoon, Abeer, Qader, Shko, Vimal, S., Chhabra, Amit, Yaseen, Zaher Mundher
Self-adaptive software can assess and modify its behavior when the assessment indicates that the program is not performing as intended or when improved functionality or performance is available. Since the mid-1960s, the subject of system adaptivity has been extensively researched, and during the last decade, many application areas and technologies involving self-adaptation have gained prominence. All of these efforts have in common the introduction of self-adaptability through software. Thus, it is essential to investigate systematic software engineering methods to create self-adaptive systems that may be used across different domains. The primary objective of this research is to summarize current advances in awareness requirements for adaptive strategies based on an examination of state-of-the-art methods described in the literature. This paper presents a review of self-adaptive systems in the context of requirement awareness and summarizes the most common methodologies applied. At first glance, it gives a review of the previous surveys and works about self-adaptive systems. Afterward, it classifies the current self-adaptive systems based on six criteria. Then, it presents and evaluates the most common self-adaptive approaches. Lastly, an evaluation among the self-adaptive models is conducted based on four concepts (requirements description, monitoring, relationship, dependency/impact, and tools).
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > India (0.04)
- (4 more...)
- Research Report (1.00)
- Overview (1.00)