Goto

Collaborating Authors

 waterloo


DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models

Huang, Shucheng, Shi, Freda, Sun, Chen, Zhong, Jiaming, Ning, Minghao, Yang, Yufeng, Lu, Yukun, Wang, Hong, Khajepour, Amir

arXiv.org Artificial Intelligence

Personal use of this material is permitted. Abstract--Human drivers possess spatial and causal intelligence, enabling them to perceive driving scenarios, anticipate hazards, and react to dynamic environments. In contrast, autonomous vehicles lack these abilities, making it challenging to manage perception-related Safety of the Intended Functionality (SOTIF) risks, especially under complex or unpredictable driving conditions. T o address this gap, we propose fine-tuning multimodal large language models (MLLMs) on a customized dataset specifically designed to capture perception-related SOTIF scenarios. Benchmarking results show that fine-tuned MLLMs achieve an 11.8% improvement in close-ended VQA accuracy and a 12.0% increase in open-ended VQA scores compared to baseline models, while maintaining real-time performance with a 0.59-second average inference time per image. We validate our approach through real-world case studies in Canada and China, where fine-tuned models correctly identify safety risks that challenge even experienced human drivers. This work represents the first application of domain-specific MLLM fine-tuning for the SOTIF domain in autonomous driving. N autonomous driving (AD), safety is commonly classified into functional safety and Safety of the Intended Functionality (SOTIF). Functional safety concerns failures in hardware or software that result in unsafe operation. In contrast, SOTIF addresses hazards that occur not due to malfunctions, but when the system operates as intended yet produces unsafe outcomes because of external factors or inherent limitations [1]. Perception systems in autonomous vehicles (A Vs), which are tasked with detecting, classifying, and predicting based on environmental stimuli, are particularly vulnerable to SOTIF-related challenges. Manuscript received 2 February, 2025; revised 27 August, 2025; accepted 7 September, 2025. Y ang, and A. Khajepour are with MVS-Lab, Department of Mechanical and Mechatronics Engineering, University of Waterloo, 200 University Ave West, Waterloo ON, N2L3G1 Canada. S. Huang, and F. Shi are with CompLING Lab, David R. Cheriton School of Computer Science, University of Waterloo, 200 University Ave West, Waterloo ON, N2L3G1 Canada and V ector Institute, Toronto, Canada C. Sun is with the Department of Data and Systems Engineering, University of Hong Kong, Pok Fu Lam, Hong Kong, China (e-mail: c87sun@hku.hk) Lu is with the Department of Mechanical Engineering, University of New Brunswick, Fredericton, NB E3B 5A3, Canada (e-mail: yukun.lu@unb.ca) H. Wang is with School of V ehicle and Mobility, Tsinghua University, Beijing, China, 100084.


AInsight: Augmenting Expert Decision-Making with On-the-Fly Insights Grounded in Historical Data

Abolnejadian, Mohammad, Amirshahi, Shakiba, Brehmer, Matthew, Crisan, Anamaria

arXiv.org Artificial Intelligence

In decision-making conversations, experts must navigate complex choices and make on-the-spot decisions while engaged in conversation. Although extensive historical data often exists, the real-time nature of these scenarios makes it infeasible for decision-makers to review and leverage relevant information. This raises an interesting question: What if experts could utilize relevant past data in real-time decision-making through insights derived from past data? To explore this, we implemented a conversational user interface, taking doctor-patient interactions as an example use case. Our system continuously listens to the conversation, identifies patient problems and doctor-suggested solutions, and retrieves related data from an embedded dataset, generating concise insights using a pipeline built around a retrieval-based Large Language Model (LLM) agent. We evaluated the prototype by embedding Health Canada datasets into a vector database and conducting simulated studies using sample doctor-patient dialogues, showing effectiveness but also challenges, setting directions for the next steps of our work.


CoInfra: A Large-Scale Cooperative Infrastructure Perception System and Dataset in Adverse Weather

Ning, Minghao, Yang, Yufeng, Shu, Keqi, Huang, Shucheng, Zhong, Jiaming, Salehi, Maryam, Rahmani, Mahdi, Lu, Yukun, Sun, Chen, Saleh, Aladdin, Hashemi, Ehsan, Khajepour, Amir

arXiv.org Artificial Intelligence

We present CoInfra, a large-scale cooperative infrastructure perception system and dataset designed to advance robust multi-agent perception under real-world and adverse weather conditions. The CoInfra system includes 14 fully synchronized sensor nodes, each equipped with dual RGB cameras and a LiDAR, deployed across a shared region and operating continuously to capture all traffic participants in real-time. A robust, delay-aware synchronization protocol and a scalable system architecture that supports real-time data fusion, OTA management, and remote monitoring are provided in this paper. On the other hand, the dataset was collected in different weather scenarios, including sunny, rainy, freezing rain, and heavy snow and includes 195k LiDAR frames and 390k camera images from 8 infrastructure nodes that are globally time-aligned and spatially calibrated. Furthermore, comprehensive 3D bounding box annotations for five object classes (i.e., car, bus, truck, person, and bicycle) are provided in both global and individual node frames, along with high-definition maps for contextual understanding. Baseline experiments demonstrate the trade-offs between early and late fusion strategies, the significant benefits of HD map integration are discussed. By openly releasing our dataset, codebase, and system documentation at https://github.com/NingMingHao/CoInfra, we aim to enable reproducible research and drive progress in infrastructure-supported autonomous driving, particularly in challenging, real-world settings.


Visual-Conversational Interface for Evidence-Based Explanation of Diabetes Risk Prediction

Samimi, Reza, Bhattacharya, Aditya, Gosak, Lucija, Stiglic, Gregor, Verbert, Katrien

arXiv.org Artificial Intelligence

Healthcare professionals need effective ways to use, understand, and validate AI-driven clinical decision support systems. Existing systems face two key limitations: complex visualizations and a lack of grounding in scientific evidence. We present an integrated decision support system that combines interactive visualizations with a conversational agent to explain diabetes risk assessments. We propose a hybrid prompt handling approach combining fine-tuned language models for analytical queries with general Large Language Models (LLMs) for broader medical questions, a methodology for grounding AI explanations in scientific evidence, and a feature range analysis technique to support deeper understanding of feature contributions. We conducted a mixed-methods study with 30 healthcare professionals and found that the conversational interactions helped healthcare professionals build a clear understanding of model assessments, while the integration of scientific evidence calibrated trust in the system's decisions. Most participants reported that the system supported both patient risk evaluation and recommendation.


The Impact of a Chatbot's Ephemerality-Framing on Self-Disclosure Perceptions

Cox, Samuel Rhys, Jacobsen, Rune Møberg, van Berkel, Niels

arXiv.org Artificial Intelligence

Self-disclosure, the sharing of one's thoughts and feelings, is affected by the perceived relationship between individuals. While chatbots are increasingly used for self-disclosure, the impact of a chatbot's framing on users' self-disclosure remains under-explored. We investigated how a chatbot's description of its relationship with users, particularly in terms of ephemerality, affects self-disclosure. Specifically, we compared a Familiar chatbot, presenting itself as a companion remembering past interactions, with a Stranger chatbot, presenting itself as a new, unacquainted entity in each conversation. In a mixed factorial design, participants engaged with either the Familiar or Stranger chatbot in two sessions across two days, with one conversation focusing on Emotional- and another Factual-disclosure. When Emotional-disclosure was sought in the first chatting session, Stranger-condition participants felt more comfortable self-disclosing. However, when Factual-disclosure was sought first, these differences were replaced by more enjoyment among Familiar-condition participants. Qualitative findings showed Stranger afforded anonymity and reduced judgement, whereas Familiar sometimes felt intrusive unless rapport was built via low-risk Factual-disclosure.


How Managers Perceive AI-Assisted Conversational Training for Workplace Communication

Wilhelm, Lance T., Ding, Xiaohan, Knutsen, Kirk McInnis, Carik, Buse, Rho, Eugenia H.

arXiv.org Artificial Intelligence

Effective workplace communication is essential for managerial success, yet many managers lack access to tailored and sustained training. Although AI-assisted communication systems may offer scalable training solutions, little is known about how managers envision the role of AI in helping them improve their communication skills. To investigate this, we designed a conversational role-play system, CommCoach, as a functional probe to understand how managers anticipate using AI to practice their communication skills. Through semi-structured interviews, participants emphasized the value of adaptive, low-risk simulations for practicing difficult workplace conversations. They also highlighted opportunities, including human-AI teaming, transparent and context-aware feedback, and greater control over AI-generated personas. AI-assisted communication training should balance personalization, structured learning objectives, and adaptability to different user styles and contexts. However, achieving this requires carefully navigating tensions between adaptive and consistent AI feedback, realism and potential bias, and the open-ended nature of AI conversations versus structured workplace discourse.


PitcherNet helps researchers throw strikes with AI analysis

AIHub

University of Waterloo researchers have developed new artificial intelligence (AI) technology that can accurately analyze pitcher performance and mechanics using low-resolution video of baseball games. The system, developed for the Baltimore Orioles by the Waterloo team, plugs holes in much more elaborate and expensive technology already installed in most stadiums that host Major League Baseball (MLB), whose teams have increasingly tapped into data analytics in recent years. Waterloo researchers convert video of a pitcher's performance into a two-dimensional model that PitcherNet's AI algorithm can later analyze. Those systems, produced by a company called Hawk-Eye Innovations, use multiple special cameras in each park to catch players in action, but the data they yield is typically available to the home team that owns the stadium those games are played in. To add away games to their analytics operation, as well as use smartphone video taken by scouts in minor league and college games, the Orioles asked video and AI experts at Waterloo for help about three years ago.


Learning Skateboarding for Humanoid Robots through Massively Parallel Reinforcement Learning

Thibault, William, Rajendran, Vidyasagar, Melek, William, Mombaur, Katja

arXiv.org Artificial Intelligence

Abstract-- Learning-based methods have proven useful at generating complex motions for robots, including humanoids. Reinforcement learning (RL) has been used to learn locomotion policies, some of which leverage a periodic reward formulation. This work extends the periodic reward formulation of locomotion to skateboarding for the REEM-C robot. Brax/MJX is used to implement the RL problem to achieve fast training. Initial results in simulation are presented with hardware experiments in progress.


Machine learning-aided thermography for building heat loss detection

AIHub

University of Waterloo researchers have developed a new method that could lead to significant energy savings in buildings. The team identified 28 major heat loss regions in a multi-unit residential building with the most severe ones being at wall intersections and around windows. Building enclosures rely on heat and moisture control to avoid significant energy loss due to airflow leakage, which makes buildings less comfortable and more costly to maintain. This problem will likely be compounded by climate change due to volatile temperature fluctuations. Since manual inspection is time-consuming and infrequently done due to a lack of trained personnel, energy inefficiency becomes a widespread problem for buildings.


Going top shelf with AI to better track hockey data

AIHub

Researchers from the University of Waterloo got a valuable assist from artificial intelligence (AI) tools to help capture and analyze data from professional hockey games more quickly and more accurately, something which could have implications for the business of sports. The growing field of hockey analytics currently relies on the manual analysis of video footage from games. Professional hockey teams across the sport, notably in the National Hockey League (NHL), make important decisions regarding players' careers based on that information. "The goal of our research is to interpret a hockey game through video more effectively and efficiently than a human," said Dr David Clausi, a professor in Waterloo's Department of Systems Design Engineering. Bounding boxes are used to identify players as they move on the ice in broadcast game video.