Energy
Hierarchical Semi-Markov Models with Duration-Aware Dynamics for Activity Sequences
Dube, Rohit, Gautam, Natarajan, Banerjee, Amarnath, Nagarajan, Harsha
Residential electricity demand at granular scales is driven by what people do and for how long. Accurately forecasting this demand for applications like microgrid management and demand response therefore requires generative models that can produce realistic daily activity sequences, capturing both the timing and duration of human behavior. This paper develops a generative model of human activity sequences using nationally representative time-use diaries at a 10-minute resolution. We use this model to quantify which demographic factors are most critical for improving predictive performance. We propose a hierarchical semi-Markov framework that addresses two key modeling challenges. First, a time-inhomogeneous Markov \emph{router} learns the patterns of ``which activity comes next." Second, a semi-Markov \emph{hazard} component explicitly models activity durations, capturing ``how long" activities realistically last. To ensure statistical stability when data are sparse, the model pools information across related demographic groups and time blocks. The entire framework is trained and evaluated using survey design weights to ensure our findings are representative of the U.S. population. On a held-out test set, we demonstrate that explicitly modeling durations with the hazard component provides a substantial and statistically significant improvement over purely Markovian models. Furthermore, our analysis reveals a clear hierarchy of demographic factors: Sex, Day-Type, and Household Size provide the largest predictive gains, while Region and Season, though important for energy calculations, contribute little to predicting the activity sequence itself. The result is an interpretable and robust generator of synthetic activity traces, providing a high-fidelity foundation for downstream energy systems modeling.
Forest tree species classification and entropy-derived uncertainty mapping using extreme gradient boosting and Sentinel-1/2 data
Abdi, Abdulhakim M., Wang, Fan
We present a wall-to - wall map of dominant tree species in Swedish forests accompanied by pixel - level uncertainty estimates. The tree species classification is based on spatiotemporal metrics derived from Sentinel-1 and Sentinel - 2 satellite data, combined with field observations from the Swedish National Forest Inventory and auxiliary data on geomorphometry and canopy height. We apply an extreme gradient boosting model with Bayesian optimization to relate field observations to satellite-derived features and generate the final species map. Classification uncertainty is quantified using Shannon's entropy of the predicted class probabilities, which provide a spatially explicit measure of model confidence. The final model achieved an overall accuracy of 85% (F1 score = 0.82, Matthews correlation coefficient = 0.81), and mapped species distributions showed strong agreement with official forest statistics (r = 0.96). V ariable importance analysis revealed that the most influential predictors were optical bands from Sentinel - 2, particularly those acquired in spring and summer. This study provides scalable, interpretable, and policy-relevant method for tree species mapping with integrated uncertainty that are well-suited to meet emerging legislative and environmental goals.
MSCoRe: A Benchmark for Multi-Stage Collaborative Reasoning in LLM Agents
Lei, Yuzhen, Xie, Hongbin, Zhao, Jiaxing, Liu, Shuangxue, Song, Xuan
Large Language Models (LLMs) have excelled in question-answering (QA) tasks within single domains. However, their reasoning and coordination capabilities in complex, multi-stage scenarios remain underexplored. Existing benchmarks typically focus on isolated tasks or narrow domains, overlooking models' abilities for multi-stage collaboration and optimization without explicit external guidance. To bridge this gap, we propose \textbf{MSCoRe}, a novel benchmark comprising 126696 domain-specific QA instances spanning scenarios in automotive, pharmaceutical, electronics, and energy sectors. The dataset is created using a structured three-phase pipeline: dynamic sampling, iterative question-answer generation, and a multi-level quality assessment to ensure data quality. Tasks are further categorized into three difficulty levels according to stage coverage and complexity. With MSCoRe, we have conducted a comprehensive evaluation of various state-of-the-art LLM agents. The commercial models performed best across all tasks and scenarios, but a notable gap in ROUGE scores remains between simple and complex tasks. We also tested the models' robustness and found that their performance is negatively affected by noisy data. MSCoRe provides a valuable new resource for the community to evaluate and improve multi-stage reasoning in LLM agents. The code and data are available at https://github.com/D3E0-source/MSCoRE.
Dynamical Alignment: A Principle for Adaptive Neural Computation
The computational capabilities of a neural network are widely assumed to be determined by its static architecture. Here we challenge this view by establishing that a fixed neural structure can operate in fundamentally different computational modes, driven not by its structure but by the temporal dynamics of its input signals. We term this principle 'Dynamical Alignment'. Applying this principle offers a novel resolution to the long-standing paradox of why brain-inspired spiking neural networks (SNNs) underperform. By encoding static input into controllable dynamical trajectories, we uncover a bimodal optimization landscape with a critical phase transition governed by phase space volume dynamics. A 'dissipative' mode, driven by contracting dynamics, achieves superior energy efficiency through sparse temporal codes. In contrast, an 'expansive' mode, driven by expanding dynamics, unlocks the representational power required for SNNs to match or even exceed their artificial neural network counterparts on diverse tasks, including classification, reinforcement learning, and cognitive integration. We find this computational advantage emerges from a timescale alignment between input dynamics and neuronal integration. This principle, in turn, offers a unified, computable perspective on long-observed dualities in neuroscience, from stability-plasticity dilemma to segregation-integration dynamic. It demonstrates that computation in both biological and artificial systems can be dynamically sculpted by 'software' on fixed 'hardware', pointing toward a potential paradigm shift for AI research: away from designing complex static architectures and toward mastering adaptive, dynamic computation principles.
Q-DPTS: Quantum Differentially Private Time Series Forecasting via Variational Quantum Circuits
Chen, Chi-Sheng, Chen, Samuel Yen-Chi
Time series forecasting is vital in domains where data sensitivity is paramount, such as finance and energy systems. While Differential Privacy (DP) provides theoretical guarantees to protect individual data contributions, its integration especially via DP-SGD often impairs model performance due to injected noise. In this paper, we propose Q-DPTS, a hybrid quantum-classical framework for Quantum Differentially Private Time Series Forecasting. Q-DPTS combines Variational Quantum Circuits (VQCs) with per-sample gradient clipping and Gaussian noise injection, ensuring rigorous $(ฮต, ฮด)$-differential privacy. The expressiveness of quantum models enables improved robustness against the utility loss induced by DP mechanisms. We evaluate Q-DPTS on the ETT (Electricity Transformer Temperature) dataset, a standard benchmark for long-term time series forecasting. Our approach is compared against both classical and quantum baselines, including LSTM, QASA, QRWKV, and QLSTM. Results demonstrate that Q-DPTS consistently achieves lower prediction error under the same privacy budget, indicating a favorable privacy-utility trade-off. This work presents one of the first explorations into quantum-enhanced differentially private forecasting, offering promising directions for secure and accurate time series modeling in privacy-critical scenarios.
Intra-DP: A High Performance Collaborative Inference System for Mobile Edge Computing
Sun, Zekai, Guan, Xiuxian, Lin, Zheng, Fang, Zihan, Cai, Xiangming, Chen, Zhe, Liu, Fangming, Cui, Heming, Xiong, Jie, Ni, Wei, Yuen, Chau
Deploying deep neural networks (DNNs) on resource-constrained mobile devices presents significant challenges, particularly in achieving real-time performance while simultaneously coping with limited computational resources and battery life. While Mobile Edge Computing (MEC) offers collaborative inference with GPU servers as a promising solution, existing approaches primarily rely on layer-wise model partitioning and undergo significant transmission bottlenecks caused by the sequential execution of DNN operations. To address this challenge, we present Intra-DP, a high-performance collaborative inference system optimized for DNN inference on MEC. Intra DP employs a novel parallel computing technique based on local operators (i.e., operators whose minimum unit input is not the entire input tensor, such as the convolution kernel). By decomposing their computations (operations) into several independent sub-operations and overlapping the computation and transmission of different sub-operations through parallel execution, Intra-DP mitigates transmission bottlenecks in MEC, achieving fast and energy-efficient inference. The evaluation demonstrates that Intra-DP reduces per-inference latency by up to 50% and energy consumption by up to 75% compared to state-of-the-art baselines, without sacrificing accuracy.
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Large language models (LLMs) can amplify misinformation, undermining societal goals like the UN SDGs. We study three documented drivers of misinformation (valence framing, information overload, and oversimplification) which are often shaped by one's default beliefs. Building on evidence that LLMs encode such defaults (e.g., "joy is positive," "math is complex") and can act as "bags of heuristics," we ask: can general belief-driven heuristics behind misinformative behaviour be recovered from LLMs as clear rules? A key obstacle is that global rule-extraction methods in explainable AI (XAI) are built for numerical inputs/outputs, not text. We address this by eliciting global LLM beliefs and mapping them to numerical scores via statistically reliable abstractions, thereby enabling off-the-shelf global XAI to detect belief-related heuristics in LLMs. To obtain ground truth, we hard-code bias-inducing nonlinear heuristics of increasing complexity (univariate, conjunctive, nonconvex) into popular LLMs (ChatGPT and Llama) via system instructions. This way, we find that RuleFit under-detects non-univariate biases, while global SHAP better approximates conjunctive ones but does not yield actionable rules. To bridge this gap, we propose RuleSHAP, a rule-extraction algorithm that couples global SHAP-value aggregations with rule induction to better capture non-univariate bias, improving heuristics detection over RuleFit by +94% (MRR@1) on average. Our results provide a practical pathway for revealing belief-driven biases in LLMs.
Video Killed the Energy Budget: Characterizing the Latency and Power Regimes of Open Text-to-Video Models
Delavande, Julien, Pierrard, Regis, Luccioni, Sasha
Recent advances in text-to-video (T2V) generation have enabled the creation of high-fidelity, temporally coherent clips from natural language prompts. Yet these systems come with significant computational costs, and their energy demands remain poorly understood. In this paper, we present a systematic study of the latency and energy consumption of state-of-the-art open-source T2V models. We first develop a compute-bound analytical model that predicts scaling laws with respect to spatial resolution, temporal length, and denoising steps. We then validate these predictions through fine-grained experiments on WAN2.1-T2V, showing quadratic growth with spatial and temporal dimensions, and linear scaling with the number of denoising steps. Finally, we extend our analysis to six diverse T2V models, comparing their runtime and energy profiles under default settings. Our results provide both a benchmark reference and practical insights for designing and deploying more sustainable generative video systems.
Anecdoctoring: Automated Red-Teaming Across Language and Place
Cuevas, Alejandro, Dash, Saloni, Nayak, Bharat Kumar, Vann, Dan, Daepp, Madeleine I. G.
Disinformation is among the top risks of generative artificial intelligence (AI) misuse. Global adoption of generative AI necessitates red-teaming evaluations (i.e., systematic adversarial probing) that are robust across diverse languages and cultures, but red-teaming datasets are commonly US- and English-centric. To address this gap, we propose "anecdoctoring", a novel red-teaming approach that automatically generates adversarial prompts across languages and cultures. We collect misinformation claims from fact-checking websites in three languages (English, Spanish, and Hindi) and two geographies (US and India). We then cluster individual claims into broader narratives and characterize the resulting clusters with knowledge graphs, with which we augment an attacker LLM. Our method produces higher attack success rates and offers interpretability benefits relative to few-shot prompting. Results underscore the need for disinformation mitigations that scale globally and are grounded in real-world adversarial misuse.
Memory in Large Language Models: Mechanisms, Evaluation and Evolution
Zhang, Dianxing, Li, Wendong, Song, Kani, Lu, Jiaye, Li, Gang, Yang, Liuchun, Li, Sheng
Under a unified operational definition, we define LLM memory as a persistent state written during pretraining, finetuning, or inference that can later be addressed and that stably influences outputs. We propose a four-part taxonomy (parametric, contextual, external, procedural/episodic) and a memory quadruple (location, persistence, write/access path, controllability). We link mechanism, evaluation, and governance via the chain write -> read -> inhibit/update. To avoid distorted comparisons across heterogeneous setups, we adopt a three-setting protocol (parametric only, offline retrieval, online retrieval) that decouples capability from information availability on the same data and timeline. On this basis we build a layered evaluation: parametric (closed-book recall, edit differential, memorization/privacy), contextual (position curves and the mid-sequence drop), external (answer correctness vs snippet attribution/faithfulness), and procedural/episodic (cross-session consistency and timeline replay, E MARS+). The framework integrates temporal governance and leakage auditing (freshness hits, outdated answers, refusal slices) and uncertainty reporting via inter-rater agreement plus paired tests with multiple-comparison correction. For updating and forgetting, we present DMM Gov: coordinating DAPT/TAPT, PEFT, model editing (ROME, MEND, MEMIT, SERAC), and RAG to form an auditable loop covering admission thresholds, rollout, monitoring, rollback, and change audits, with specs for timeliness, conflict handling, and long-horizon consistency. Finally, we give four testable propositions: minimum identifiability; a minimal evaluation card; causally constrained editing with verifiable forgetting; and when retrieval with small-window replay outperforms ultra-long-context reading. This yields a reproducible, comparable, and governable coordinate system for research and deployment.