Goto

Collaborating Authors

 Instructional Material


There's a simple way we could drastically cut AI energy use

New Scientist

There's a simple way we could drastically cut AI energy use Being more judicious in which AI models we use for tasks could potentially save 31.9 terawatt-hours of energy this year alone - equivalent to the output of five nuclear reactors. Tiago da Silva Barros at the University of Cote d'Azur in France and his colleagues looked at 14 different tasks that people use generative AI tools for, ranging from text generation to speech recognition and image classification. 'Flashes of brilliance and frustration': I let an AI agent run my day They then examined public leaderboards, including those hosted by the machine learning hub Hugging Face, for how different models perform. The energy efficiency of the models during inference - when an AI model produces an answer - was measured by a tool called CarbonTracker, and the total energy use of that model was calculated by tracking user downloads. "Based on the size of the model, we estimated the energy consumption, and based on this, we can try to do our estimations," says da Silva Barros.


Personalized Learning Path Planning with Goal-Driven Learner State Modeling

arXiv.org Artificial Intelligence

Personalized Learning Path Planning (PLPP) aims to design adaptive learning paths that align with individual goals. While large language models (LLMs) show potential in personalizing learning experiences, existing approaches often lack mechanisms for goal-aligned planning. We introduce Pxplore, a novel framework for PLPP that integrates a reinforcement-based training paradigm and an LLM-driven educational architecture. We design a structured learner state model and an automated reward function that transforms abstract objectives into computable signals. We train the policy combining supervised fine-tuning (SFT) and Group Relative Policy Optimization (GRPO), and deploy it within a real-world learning platform. Extensive experiments validate Pxplore's effectiveness in producing coherent, personalized, and goal-driven learning paths. We release our code and dataset to facilitate future research.


From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails

arXiv.org Artificial Intelligence

Generative AI systems are increasingly assisting and acting on behalf of end users in practical settings, from digital shopping assistants to next-generation autonomous cars. In this context, safety is no longer about blocking harmful content, but about preempting downstream hazards like financial or physical harm. Yet, most AI guardrails continue to rely on output classification based on labeled datasets and human-specified criteria,making them brittle to new hazardous situations. Even when unsafe conditions are flagged, this detection offers no path to recovery: typically, the AI system simply refuses to act--which is not always a safe choice. In this work, we argue that agentic AI safety is fundamentally a sequential decision problem: harmful outcomes arise from the AI system's continually evolving interactions and their downstream consequences on the world. We formalize this through the lens of safety-critical control theory, but within the AI model's latent representation of the world. This enables us to build predictive guardrails that (i) monitor an AI system's outputs (actions) in real time and (ii) proactively correct risky outputs to safe ones, all in a model-agnostic manner so the same guardrail can be wrapped around any AI model. We also offer a practical training recipe for computing such guardrails at scale via safety-critical reinforcement learning. Our experiments in simulated driving and e-commerce settings demonstrate that control-theoretic guardrails can reliably steer LLM agents clear of catastrophic outcomes (from collisions to bankruptcy) while preserving task performance, offering a principled dynamic alternative to today's flag-and-block guardrails.


CurLL: A Developmental Framework to Evaluate Continual Learning in Language Models

arXiv.org Artificial Intelligence

We introduce a comprehensive continual learning dataset and benchmark (CurlL) grounded in human developmental trajectories from ages 5-10, enabling systematic and fine-grained assessment of models' ability to progressively acquire new skills. CurlL spans five developmental stages (0-4) covering ages 5-10, supported by a skill graph that breaks down broad skills into smaller abilities, concrete goals, and measurable indicators, while also capturing which abilities build on others. We generate a 23.4B-token synthetic dataset with controlled skill progression, vocabulary complexity, and format diversity, comprising paragraphs, comprehension-based QA (CQA), skill-testing QA (CSQA), and instruction-response (IR) pairs. Stage-wise token counts range from 2.12B to 6.78B tokens, supporting precise analysis of forgetting, forward transfer, and backward transfer. Using a 135M-parameter transformer trained under independent, joint, and sequential (continual) setups, we show trade-offs in skill retention and transfer efficiency. By mirroring human learning patterns and providing fine-grained control over skill dependencies, this work advances continual learning evaluations for language models.


EduDial: Constructing a Large-scale Multi-turn Teacher-Student Dialogue Corpus

arXiv.org Artificial Intelligence

Recently, several multi-turn dialogue benchmarks have been proposed to evaluate the conversational abilities of large language models (LLMs). As LLMs are increasingly recognized as a key technology for advancing intelligent education, owing to their ability to deeply understand instructional contexts and provide personalized guidance, the construction of dedicated teacher-student dialogue benchmarks has become particularly important. To this end, we present EduDial, a comprehensive multi-turn teacher-student dialogue dataset. EduDial covers 345 core knowledge points and consists of 34,250 dialogue sessions generated through interactions between teacher and student agents. Its design is guided by Bloom's taxonomy of educational objectives and incorporates ten questioning strategies, including situational questioning, zone of proximal development (ZPD) questioning, and metacognitive questioning-thus better capturing authentic classroom interactions. Furthermore, we design differentiated teaching strategies for students at different cognitive levels, thereby providing more targeted teaching guidance. Building on EduDial, we further develop EduDial-LLM 32B via training and propose an 11-dimensional evaluation framework that systematically measures the teaching abilities of LLMs, encompassing both overall teaching quality and content quality. Experiments on 17 mainstream LLMs reveal that most models struggle in student-centered teaching scenarios, whereas our EduDial-LLM achieves significant gains, consistently outperforming all baselines across all metrics. The code is available at https://github.com/Mind-Lab-ECNU/EduDial/tree/main.


HealthProcessAI: A Technical Framework and Proof-of-Concept for LLM-Enhanced Healthcare Process Mining

arXiv.org Artificial Intelligence

Process mining has emerged as a powerful analytical technique for understanding complex healthcare workflows. However, its application faces significant barriers, including technical complexity, a lack of standardized approaches, and limited access to practical training resources. We introduce HealthProcessAI, a GenAI framework designed to simplify process mining applications in healthcare and epidemiology by providing a comprehensive wrapper around existing Python (PM4PY) and R (bupaR) libraries. To address unfamiliarity and improve accessibility, the framework integrates multiple Large Language Models (LLMs) for automated process map interpretation and report generation, helping translate technical analyses into outputs that diverse users can readily understand. We validated the framework using sepsis progression data as a proof-of-concept example and compared the outputs of five state-of-the-art LLM models through the OpenRouter platform. To test its functionality, the framework successfully processed sepsis data across four proof-of-concept scenarios, demonstrating robust technical performance and its capability to generate reports through automated LLM analysis. LLM evaluation using five independent LLMs as automated evaluators revealed distinct model strengths: Claude Sonnet-4 and Gemini 2.5-Pro achieved the highest consistency scores (3.79/4.0 and 3.65/4.0) when evaluated by automated LLM assessors. By integrating multiple Large Language Models (LLMs) for automated interpretation and report generation, the framework addresses widespread unfamiliarity with process mining outputs, making them more accessible to clinicians, data scientists, and researchers. This structured analytics and AI-driven interpretation combination represents a novel methodological advance in translating complex process mining results into potentially actionable insights for healthcare applications.


Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete Space

arXiv.org Machine Learning

Systems of interacting continuous-time Markov chains are a powerful model class, but inference is typically intractable in high dimensional settings. Auxiliary information, such as noisy observations, is typically only available at discrete times, and incorporating it via a Doob's $h-$transform gives rise to an intractable posterior process that requires approximation. We introduce Latent Interacting Particle Systems, a model class parameterizing the generator of each Markov chain in the system. Our inference method involves estimating look-ahead functions (twist potentials) that anticipate future information, for which we introduce an efficient parameterization. We incorporate this approximation in a twisted Sequential Monte Carlo sampling scheme. We demonstrate the effectiveness of our approach on a challenging posterior inference task for a latent SIRS model on a graph, and on a neural model for wildfire spread dynamics trained on real data.


ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

arXiv.org Artificial Intelligence

Recent advances in embodied AI highlight the potential of vision language models (VLMs) as agents capable of perception, reasoning, and interaction in complex environments. However, top-performing systems rely on large-scale models that are costly to deploy, while smaller VLMs lack the necessary knowledge and skills to succeed. To bridge this gap, we present \textit{Embodied Reasoning Agent (ERA)}, a two-stage framework that integrates prior knowledge learning and online reinforcement learning (RL). The first stage, \textit{Embodied Prior Learning}, distills foundational knowledge from three types of data: (1) Trajectory-Augmented Priors, which enrich existing trajectory data with structured reasoning generated by stronger models; (2) Environment-Anchored Priors, which provide in-environment knowledge and grounding supervision; and (3) External Knowledge Priors, which transfer general knowledge from out-of-environment datasets. In the second stage, we develop an online RL pipeline that builds on these priors to further enhance agent performance. To overcome the inherent challenges in agent RL, including long horizons, sparse rewards, and training instability, we introduce three key designs: self-summarization for context management, dense reward shaping, and turn-level policy optimization. Extensive experiments on both high-level planning (EB-ALFRED) and low-level control (EB-Manipulation) tasks demonstrate that ERA-3B surpasses both prompting-based large models and previous training-based baselines. Specifically, it achieves overall improvements of 8.4\% on EB-ALFRED and 19.4\% on EB-Manipulation over GPT-4o, and exhibits strong generalization to unseen tasks. Overall, ERA offers a practical path toward scalable embodied intelligence, providing methodological insights for future embodied AI systems.


Few Shot Semi-Supervised Learning for Abnormal Stop Detection from Sparse GPS Trajectories

arXiv.org Artificial Intelligence

Abnormal stop detection (ASD) in intercity coach transportation is critical for ensuring passenger safety, operational reliability, and regulatory compliance. However, two key challenges hinder ASD effectiveness: sparse GPS trajectories, which obscure short or unauthorized stops, and limited labeled data, which restricts supervised learning. Existing methods often assume dense sampling or regular movement patterns, limiting their applicability. To address data sparsity, we propose a Sparsity-Aware Segmentation (SAS) method that adaptively defines segment boundaries based on local spatial-temporal density. Building upon these segments, we introduce three domain-specific indicators to capture abnormal stop behaviors. To further mitigate the impact of sparsity, we develop Locally Temporal-Indicator Guided Adjustment (LTIGA), which smooths these indicators via local similarity graphs. To overcome label scarcity, we construct a spatial-temporal graph where each segment is a node with LTIGA-refined features. We apply label propagation to expand weak supervision across the graph, followed by a GCN to learn relational patterns. A final self-training module incorporates high-confidence pseudo-labels to iteratively improve predictions. Experiments on real-world coach data show an AUC of 0.854 and AP of 0.866 using only 10 labeled instances, outperforming prior methods. The code and dataset are publicly available at \href{https://github.com/pangjunbiao/Abnormal-Stop-Detection-SSL.git}


Multi-Action Self-Improvement for Neural Combinatorial Optimization

arXiv.org Artificial Intelligence

Self-improvement has emerged as a state-of-the-art paradigm in Neural Combinatorial Optimization (NCO), where models iteratively refine their policies by generating and imitating high-quality solutions. Despite strong empirical performance, existing methods face key limitations. Training is computationally expensive, as policy updates require sampling numerous candidate solutions per instance to extract a single expert trajectory. More fundamentally, these approaches fail to exploit the structure of combinatorial problems involving the coordination of multiple agents, such as vehicles in min-max routing or machines in scheduling. By supervising on single-action trajectories, they fail to exploit agent-permutation symmetries, where distinct sequences of actions yield identical solutions, hindering generalization and the ability to learn coordinated behavior. We address these challenges by extending self-improvement to operate over joint multi-agent actions. Our model architecture predicts complete agent-task assignments jointly at each decision step. To explicitly leverage symmetries, we employ a set-prediction loss, which supervises the policy on multiple expert assignments for any given state. This approach enhances sample efficiency and the model's ability to learn coordinated behavior. Furthermore, by generating multi-agent actions in parallel, it drastically accelerates the solution generation phase of the self-improvement loop. Empirically, we validate our method on several combinatorial problems, demonstrating consistent improvements in the quality of the final solution and a reduced generation latency compared to standard self-improvement.