trondheim
Leveraging LLMs for User Stories in AI Systems: UStAI Dataset
Yamani, Asma, Baslyman, Malak, Ahmed, Moataz
AI systems are gaining widespread adoption across various sectors and domains. Creating high-quality AI system requirements is crucial for aligning the AI system with business goals and consumer values and for social responsibility. However, with the uncertain nature of AI systems and the heavy reliance on sensitive data, more research is needed to address the elicitation and analysis of AI systems requirements. With the proprietary nature of many AI systems, there is a lack of open-source requirements artifacts and technical requirements documents for AI systems, limiting broader research and investigation. With Large Language Models (LLMs) emerging as a promising alternative to human-generated text, this paper investigates the potential use of LLMs to generate user stories for AI systems based on abstracts from scholarly papers. We conducted an empirical evaluation using three LLMs and generated $1260$ user stories from $42$ abstracts from $26$ domains. We assess their quality using the Quality User Story (QUS) framework. Moreover, we identify relevant non-functional requirements (NFRs) and ethical principles. Our analysis demonstrates that the investigated LLMs can generate user stories inspired by the needs of various stakeholders, offering a promising approach for generating user stories for research purposes and for aiding in the early requirements elicitation phase of AI systems. We have compiled and curated a collection of stories generated by various LLMs into a dataset (UStAI), which is now publicly available for use.
Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?
Jahan, Sigma, Rahman, Mohammad Masudur
As attention-based deep learning models scale in size and complexity, diagnosing their faults becomes increasingly challenging. In this work, we conduct an empirical study to evaluate the potential of Hessian-based analysis for diagnosing faults in attention-based models. Specifically, we use Hessian-derived insights to identify fragile regions (via curvature analysis) and parameter interdependencies (via parameter interaction analysis) within attention mechanisms. Through experiments on three diverse models (HAN, 3D-CNN, DistilBERT), we show that Hessian-based metrics can localize instability and pinpoint fault sources more effectively than gradients alone. Our empirical findings suggest that these metrics could significantly improve fault diagnosis in complex neural architectures, potentially improving software debugging practices.
A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics
Katzy, Jonathan, Huang, Yongcheng, Panchu, Gopal-Raj, Ziemlewski, Maksym, Loizides, Paris, Vermeulen, Sander, van Deursen, Arie, Izadi, Maliheh
Large Language Models are essential coding assistants, yet their training is predominantly English-centric. In this study, we evaluate the performance of code language models in non-English contexts, identifying challenges in their adoption and integration into multilingual workflows. We conduct an open-coding study to analyze errors in code comments generated by five state-of-the-art code models, CodeGemma, CodeLlama, CodeQwen1.5, GraniteCode, and StarCoder2 across five natural languages: Chinese, Dutch, English, Greek, and Polish. Our study yields a dataset of 12,500 labeled generations, which we publicly release. We then assess the reliability of standard metrics in capturing comment \textit{correctness} across languages and evaluate their trustworthiness as judgment criteria. Through our open-coding investigation, we identified a taxonomy of 26 distinct error categories in model-generated code comments. They highlight variations in language cohesion, informativeness, and syntax adherence across different natural languages. Our analysis shows that, while these models frequently produce partially correct comments, modern neural metrics fail to reliably differentiate meaningful completions from random noise. Notably, the significant score overlap between expert-rated correct and incorrect comments calls into question the effectiveness of these metrics in assessing generated comments.
Towards Adaptive Software Agents for Debugging
Majdoub, Yacine, Charrada, Eya Ben, Touati, Haifa
Using multiple agents was found to improve the debugging capabilities of Large Language Models. However, increasing the number of LLM-agents has several drawbacks such as increasing the running costs and rising the risk for the agents to lose focus. In this work, we propose an adaptive agentic design, where the number of agents and their roles are determined dynamically based on the characteristics of the task to be achieved. In this design, the agents roles are not predefined, but are generated after analyzing the problem to be solved. Our initial evaluation shows that, with the adaptive design, the number of agents that are generated depends on the complexity of the buggy code. In fact, for simple code with mere syntax issues, the problem was usually fixed using one agent only. However, for more complex problems, we noticed the creation of a higher number of agents. Regarding the effectiveness of the fix, we noticed an average improvement of 11% compared to the one-shot prompting. Given these promising results, we outline future research directions to improve our design for adaptive software agents that can autonomously plan and conduct their software goals.
Testing the Fault-Tolerance of Multi-Sensor Fusion Perception in Autonomous Driving Systems
Tian, Haoxiang, Ding, Wenqiang, Han, Xingshuo, Wu, Guoquan, Guo, An, Chen, Junqi Zhang. Wei, Wei, Jun, Zhang, Tianwei
High-level Autonomous Driving Systems (ADSs), such as Google Waymo and Baidu Apollo, typically rely on multi-sensor fusion (MSF) based approaches to perceive their surroundings. This strategy increases perception robustness by combining the respective strengths of the camera and LiDAR and directly affects the safety-critical driving decisions of autonomous vehicles (AVs). However, in real-world autonomous driving scenarios, cameras and LiDAR are subject to various faults, which can probably significantly impact the decision-making and behaviors of ADSs. Existing MSF testing approaches only discovered corner cases that the MSF-based perception cannot accurately detected by MSF-based perception, while lacking research on how sensor faults affect the system-level behaviors of ADSs. To address this gap, we conduct the first exploration of the fault tolerance of MSF perception-based ADS for sensor faults. In this paper, we systematically and comprehensively build fault models for cameras and LiDAR in AVs and inject them into the MSF perception-based ADS to test its behaviors in test scenarios. To effectively and efficiently explore the parameter spaces of sensor fault models, we design a feedback-guided differential fuzzer to discover the safety violations of MSF perception-based ADS caused by the injected sensor faults. We evaluate FADE on the representative and practical industrial ADS, Baidu Apollo. Our evaluation results demonstrate the effectiveness and efficiency of FADE, and we conclude some useful findings from the experimental results. To validate the findings in the physical world, we use a real Baidu Apollo 6.0 EDU autonomous vehicle to conduct the physical experiments, and the results show the practical significance of our findings.
Physics-based deep learning reveals rising heating demand heightens air pollution in Norwegian cities
Cao, Cong, Debnath, Ramit, Alvarez, R. Michael
Policymakers frequently analyze air quality and climate change in isolation, disregarding their interactions. This study explores the influence of specific climate factors on air quality by contrasting a regression model with K-Means Clustering, Hierarchical Clustering, and Random Forest techniques. We employ Physics-based Deep Learning (PBDL) and Long Short-Term Memory (LSTM) to examine the air pollution predictions. Our analysis utilizes ten years (2009-2018) of daily traffic, weather, and air pollution data from three major cities in Norway. Findings from feature selection reveal a correlation between rising heating degree days and heightened air pollution levels, suggesting increased heating activities in Norway are a contributing factor to worsening air quality. PBDL demonstrates superior accuracy in air pollution predictions compared to LSTM. This paper contributes to the growing literature on PBDL methods for more accurate air pollution predictions using environmental variables, aiding policymakers in formulating effective data-driven climate policies.
3 world-changing examples of SAS on Azure
Last week we announced a new strategic partnership with Microsoft to further shape the future of AI and analytics in the cloud. This commitment will make it easy for SAS customers to move their analytics workloads to the cloud. And it will introduce SAS technologies to millions of Azure customers through APIs and deeper integrations that can enhance existing applications with analytics. To help illustrate how you can use SAS on Azure, I am sharing three inspiring examples from a recent SAS hackathon. Participants in this event were challenged to solve problems related to the United Nations Global Goals for Sustainable Development using SAS Viya .
5G ferry trial success in Norway
The partners have successfully demoed a small connected passenger ferry, with an AI captain, in the Norwegian fjord city of Trondheim. The autonomous ferry, named milliAmpère, transported passengers across Trondheim's harbor canal. Ericsson 5G technology enabled Telia to securely support the large amount of data transfers needed to support the autonomous ferry. MilliAmpère is equipped with sensors that record its surroundings and the steering system on board. This generates large amounts of data that needs to be communicated with the control center.
Fortnite Is a Huge Success -- And a Sign of What's to Come in Gaming
This year that game is undeniably Fortnite Battle Royale, an online free-for-all that every teen in America suddenly seems to be playing. It's not just kids, though–everyone from rapper Drake to Los Angeles Laker Josh Hart is a fan. That groundswell of support has propelled Fortnite from a simple video game into a cultural sensation, with hundreds of millions of fans worldwide who play the game, wear the gear and even learn the characters' victory dances. "Fortnite is another in a long line of games like World of Warcraft or Guitar Hero or Minecraft that is changing everything underfoot," says Mat Piscatella, a video-game industry analyst with research firm NPD Group. Fortnite's big draw is a madcap multiplayer mode that drops up to 100 players on an island in a last-person-standing showdown.
A strengthened national powerhouse for artificial intelligence in Norway - ForexTV
Some of Norway's largest companies are joining forces in establishing a national powerhouse for artificial intelligence. Its aim is to improve the quality and capacity for research, education and innovation in the field. Norway has a huge potential to be a pioneer in Artificial Intelligence (AI), but it needs resources and collaboration in order not to lag behind. To strengthen national efforts on artificial intelligence, Telenor, NTNU and SINTEF are inviting Norwegian businesses to partner on the new Norwegian Open AI Lab. While the Norwegian Open AI Lab will develop solutions specific to the partners' industries, it will also consider opportunities where Norway can take positions internationally.