Instructional Material
Stabilizing Information Flow Entropy: Regularization for Safe and Interpretable Autonomous Driving Perception
Yang, Haobo, Zhang, Shiyan, Yang, Zhuoyi, Guo, Jilong, Yang, Jun, Zhang, Xinyu
Deep perception networks in autonomous driving traditionally rely on data-intensive training regimes and post-hoc anomaly detection, often disregarding fundamental information-theoretic constraints governing stable information processing. We reconceptualize deep neural encoders as hierarchical communication chains that incrementally compress raw sensory inputs into task-relevant latent features. Within this framework, we establish two theoretically justified design principles for robust perception: (D1) smooth variation of mutual information between consecutive layers, and (D2) monotonic decay of latent entropy with network depth. Our analysis shows that, under realistic architectural assumptions, particularly blocks comprising repeated layers of similar capacity, enforcing smooth information flow (D1) naturally encourages entropy decay (D2), thus ensuring stable compression. Guided by these insights, we propose Eloss, a novel entropy-based regularizer designed as a lightweight, plug-and-play training objective. Rather than marginal accuracy improvements, this approach represents a conceptual shift: it unifies information-theoretic stability with standard perception tasks, enabling explicit, principled detection of anomalous sensor inputs through entropy deviations. Experimental validation on large-scale 3D object detection benchmarks (KITTI and nuScenes) demonstrates that incorporating Eloss consistently achieves competitive or improved accuracy while dramatically enhancing sensitivity to anomalies, amplifying distribution-shift signals by up to two orders of magnitude. This stable information-compression perspective not only improves interpretability but also establishes a solid theoretical foundation for safer, more robust autonomous driving perception systems.
REAMS: Reasoning Enhanced Algorithm for Maths Solving
Singh, Eishkaran, Bajaj, Tanav Singh, Nayak, Siddharth
The challenges of solving complex university-level mathematics problems, particularly those from MIT, and Columbia University courses, and selected tasks from the MATH dataset, remain a significant obstacle in the field of artificial intelligence. Conventional methods have consistently fallen short in this domain, highlighting the need for more advanced approaches. In this paper, we introduce a language-based solution that leverages zero-shot learning and mathematical reasoning to effectively solve, explain, and generate solutions for these advanced math problems. By integrating program synthesis, our method reduces reliance on large-scale training data while significantly improving problem-solving accuracy. Our approach achieves an accuracy of 90.15%, representing a substantial improvement over the previous benchmark of 81% and setting a new standard in automated mathematical problem-solving. These findings highlight the significant potential of advanced AI methodologies to address and overcome the challenges presented by some of the most complex mathematical courses and datasets.
Bringing Pedagogy into Focus: Evaluating Virtual Teaching Assistants' Question-Answering in Asynchronous Learning Environments
Siyan, Li, Xu, Zhen, Raghuram, Vethavikashini Chithrra, Zhang, Xuanming, Yu, Renzhe, Yu, Zhou
Asynchronous learning environments (ALEs) are widely adopted for formal and informal learning, but timely and personalized support is often limited. In this context, Virtual Teaching Assistants (VTAs) can potentially reduce the workload of instructors, but rigorous and pedagogically sound evaluation is essential. Existing assessments often rely on surface-level metrics and lack sufficient grounding in educational theories, making it difficult to meaningfully compare the pedagogical effectiveness of different VTA systems. To bridge this gap, we propose an evaluation framework rooted in learning sciences and tailored to asynchronous forum discussions, a common VTA deployment context in ALE. We construct classifiers using expert annotations of VTA responses on a diverse set of forum posts. We evaluate the effectiveness of our classifiers, identifying approaches that improve accuracy as well as challenges that hinder generalization. Our work establishes a foundation for theory-driven evaluation of VTA systems, paving the way for more pedagogically effective AI in education.
Training the next generation of physicians for artificial intelligence-assisted clinical neuroradiology: ASNR MICCAI Brain Tumor Segmentation (BraTS) 2025 Lighthouse Challenge education platform
Amiruddin, Raisa, Yordanov, Nikolay Y., Maleki, Nazanin, Fehringer, Pascal, Gkampenis, Athanasios, Janas, Anastasia, Krantchev, Kiril, Moawad, Ahmed, Umeh, Fabian, Abosabie, Salma, Abosabie, Sara, Alotaibi, Albara, Ghonim, Mohamed, Ghonim, Mohanad, Mhana, Sedra Abou Ali, Page, Nathan, Jakovljevic, Marko, Sharifi, Yasaman, Bhatia, Prisha, Manteghinejad, Amirreza, Guelen, Melisa, Veronesi, Michael, Hill, Virginia, So, Tiffany, Krycia, Mark, Petrovic, Bojan, Memon, Fatima, Cramer, Justin, Schrickel, Elizabeth, Kosovic, Vilma, Vidal, Lorenna, Thompson, Gerard, Ikuta, Ichiro, Albalooshy, Basimah, Nabavizadeh, Ali, Tahon, Nourel Hoda, Shekdar, Karuna, Bhatia, Aashim, Kirsch, Claudia, D'Anna, Gennaro, Lohmann, Philipp, Nour, Amal Saleh, Myronenko, Andriy, Goldman-Yassen, Adam, Reid, Janet R., Aneja, Sanjay, Bakas, Spyridon, Aboian, Mariam
High-quality reference standard image data creation by neuroradiology experts for automated clinical tools can be a powerful tool for neuroradiology & artificial intelligence education. We developed a multimodal educational approach for students and trainees during the MICCAI Brain Tumor Segmentation Lighthouse Challenge 2025, a landmark initiative to develop accurate brain tumor segmentation algorithms. Fifty-six medical students & radiology trainees volunteered to annotate brain tumor MR images for the BraTS challenges of 2023 & 2024, guided by faculty-led didactics on neuropathology MRI. Among the 56 annotators, 14 select volunteers were then paired with neuroradiology faculty for guided one-on-one annotation sessions for BraTS 2025. Lectures on neuroanatomy, pathology & AI, journal clubs & data scientist-led workshops were organized online. Annotators & audience members completed surveys on their perceived knowledge before & after annotations & lectures respectively. Fourteen coordinators, each paired with a neuroradiologist, completed the data annotation process, averaging 1322.9+/-760.7 hours per dataset per pair and 1200 segmentations in total. On a scale of 1-10, annotation coordinators reported significant increase in familiarity with image segmentation software pre- and post-annotation, moving from initial average of 6+/-2.9 to final average of 8.9+/-1.1, and significant increase in familiarity with brain tumor features pre- and post-annotation, moving from initial average of 6.2+/-2.4 to final average of 8.1+/-1.2. We demonstrate an innovative offering for providing neuroradiology & AI education through an image segmentation challenge to enhance understanding of algorithm development, reinforce the concept of data reference standard, and diversify opportunities for AI-driven image analysis among future physicians.
Orchestrate, Generate, Reflect: A VLM-Based Multi-Agent Collaboration Framework for Automated Driving Policy Learning
Peng, Zengqi, Xie, Yusen, Wang, Yubin, Yang, Rui, Chen, Qifeng, Ma, Jun
The advancement of foundation models fosters new initiatives for policy learning in achieving safe and efficient autonomous driving. However, a critical bottleneck lies in the manual engineering of reward functions and training curricula for complex and dynamic driving tasks, which is a labor-intensive and time-consuming process. To address this problem, we propose OGR (Orchestrate, Generate, Reflect), a novel automated driving policy learning framework that leverages vision-language model (VLM)-based multi-agent collaboration. Our framework capitalizes on advanced reasoning and multimodal understanding capabilities of VLMs to construct a hierarchical agent system. Specifically, a centralized orchestrator plans high-level training objectives, while a generation module employs a two-step analyze-then-generate process for efficient generation of reward-curriculum pairs. A reflection module then facilitates iterative optimization based on the online evaluation. Furthermore, a dedicated memory module endows the VLM agents with the capabilities of long-term memory. To enhance robustness and diversity of the generation process, we introduce a parallel generation scheme and a human-in-the-loop technique for augmentation of the reward observation space. Through efficient multi-agent cooperation and leveraging rich multimodal information, OGR enables the online evolution of reinforcement learning policies to acquire interaction-aware driving skills. Extensive experiments in the CARLA simulator demonstrate the superior performance, robust generalizability across distinct urban scenarios, and strong compatibility with various RL algorithms. Further real-world experiments highlight the practical viability and effectiveness of our framework. The source code will be available upon acceptance of the paper.
Automated Procedural Analysis via Video-Language Models for AI-assisted Nursing Skills Assessment
Chang, Shen, Liu, Dennis, Tian, Renran, Swartzell, Kristen L., Klingler, Stacie L., Nagle, Amy M., Kong, Nan
Consistent high-quality nursing care is essential for patient safety, yet current nursing education depends on subjective, time-intensive instructor feedback in training future nurses, which limits scalability and efficiency in their training, and thus hampers nursing competency when they enter the workforce. In this paper, we introduce a video-language model (VLM) based framework to develop the AI capability of automated procedural assessment and feedback for nursing skills training, with the potential of being integrated into existing training programs. Mimicking human skill acquisition, the framework follows a curriculum-inspired progression, advancing from high-level action recognition, fine-grained subaction decomposition, and ultimately to procedural reasoning. This design supports scalable evaluation by reducing instructor workload while preserving assessment quality. The system provides three core capabilities: 1) diagnosing errors by identifying missing or incorrect subactions in nursing skill instruction videos, 2) generating explainable feedback by clarifying why a step is out of order or omitted, and 3) enabling objective, consistent formative evaluation of procedures. Validation on synthesized videos demonstrates reliable error detection and temporal localization, confirming its potential to handle real-world training variability. By addressing workflow bottlenecks and supporting large-scale, standardized evaluation, this work advances AI applications in nursing education, contributing to stronger workforce development and ultimately safer patient care.
Improving User Interface Generation Models from Designer Feedback
Wu, Jason, Swearngin, Amanda, Vajjala, Arun Krishna, Leung, Alan, Nichols, Jeffrey, Barik, Titus
Despite being trained on vast amounts of data, most LLMs are unable to reliably generate well-designed UIs. Designer feedback is essential to improving performance on UI generation; however, we find that existing RLHF methods based on ratings or rankings are not well-aligned with designers' workflows and ignore the rich rationale used to critique and improve UI designs. In this paper, we investigate several approaches for designers to give feedback to UI generation models, using familiar interactions such as commenting, sketching and direct manipulation. We first perform a study with 21 designers where they gave feedback using these interactions, which resulted in ~1500 design annotations. We then use this data to finetune a series of LLMs to generate higher quality UIs. Finally, we evaluate these models with human judges, and we find that our designer-aligned approaches outperform models trained with traditional ranking feedback and all tested baselines, including GPT-5.
Towards a Transparent and Interpretable AI Model for Medical Image Classifications
Wen, Binbin, Wu, Yihang, Daqqaq, Tareef, Chaddad, Ahmad
The integration of artificial intelligence (AI) into medicine is remarkable, offering advanced diagnostic and therapeutic possibilities. However, the inherent opacity of complex AI models presents significant challenges to their clinical practicality. This paper focuses primarily on investigating the application of explainable artificial intelligence (XAI) methods, with the aim of making AI decisions transparent and interpretable. Our research focuses on implementing simulations using various medical datasets to elucidate the internal workings of the XAI model. These dataset-driven simulations demonstrate how XAI effectively interprets AI predictions, thus improving the decision-making process for healthcare professionals. In addition to a survey of the main XAI methods and simulations, ongoing challenges in the XAI field are discussed. The study highlights the need for the continuous development and exploration of XAI, particularly from the perspective of diverse medical datasets, to promote its adoption and effectiveness in the healthcare domain.
Advancing Knowledge Tracing by Exploring Follow-up Performance Trends
Liu, Hengyu, Li, Yushuai, Yu, Minghe, Zhang, Tiancheng, Yu, Ge, Pedersen, Torben Bach, Torp, Kristian, Jensen, Christian S., Li, Tianyi
Intelligent Tutoring Systems (ITS), such as Massive Open Online Courses, offer new opportunities for human learning. At the core of such systems, knowledge tracing (KT) predicts students' future performance by analyzing their historical learning activities, enabling an accurate evaluation of students' knowledge states over time. We show that existing KT methods often encounter correlation conflicts when analyzing the relationships between historical learning sequences and future performance. To address such conflicts, we propose to extract so-called Follow-up Performance Trends (FPTs) from historical ITS data and to incorporate them into KT. We propose a method called Forward-Looking Knowledge Tracing (FINER) that combines historical learning sequences with FPTs to enhance student performance prediction accuracy. FINER constructs learning patterns that facilitate the retrieval of FPTs from historical ITS data in linear time; FINER includes a novel similarity-aware attention mechanism that aggregates FPTs based on both frequency and contextual similarity; and FINER offers means of combining FPTs and historical learning sequences to enable more accurate prediction of student future performance. Experiments on six real-world datasets show that FINER can outperform ten state-of-the-art KT methods, increasing accuracy by 8.74% to 84.85%.
DETACH: Cross-domain Learning for Long-Horizon Tasks via Mixture of Disentangled Experts
Shen, Yutong, Liu, Hangxu, Zhang, Lei, Liu, Penghui, Xia, Ruizhe, Yao, Tianyi, Feng, Tongtong
Abstract--Long-Horizon (LH) tasks in Human-Scene Interaction (HSI) are complex multi-step tasks that require continuous planning, sequential decision-making, and extended execution across domains to achieve the final goal. However, existing methods heavily rely on skill chaining by concatenating pre-trained subtasks, with environment observations and self-state tightly coupled, lacking the ability to generalize to new combinations of environments and skills, failing to complete various LH tasks across domains. T o solve this problem, this paper presents DET ACH, a cross-domain learning framework for LH tasks via biologically inspired dual-stream disentanglement. Inspired by the brain's "where-what" dual pathway mechanism, DET ACH comprises two core modules: i) an environment learning module for spatial understanding, which captures object functions, spatial relationships, and scene semantics, achieving cross-domain transfer through complete environment-self disentanglement; ii) a skill learning module for task execution, which processes self-state information including joint degrees of freedom and motor patterns, enabling cross-skill transfer through independent motor pattern encoding. We conducted extensive experiments on various LH tasks in HSI scenes. Compared with existing methods, DET ACH can achieve an average subtasks success rate improvement of 23% and average execution efficiency improvement of 29%. More details can be found at: https: //sites.google.com/view/detach-learning. I. INTRODUCTION Long-Horizon (LH) tasks in Human-Scene Interaction (HSI) require continuous planning and cross-domain execution, posing challenges due to their complexity and need for environmental adaptation. These tasks have broad applications in robotics [1], medical intervention [2], and smart homes [2], with canonical examples including dexterous hand manipulation [3] and humanoid whole-body control [4].