Energy
Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning
Li, Siyang, Chen, Yize, Guo, Yan, Huang, Ming, Xiong, Hui
Advanced deep learning-based approaches have been actively applied to forecast the spatiotemporal physical dynamics governed by partial differential equations (PDEs), which acts as a critical procedure in tackling many science and engineering problems. As real-world physical environments like PDE system parameters are always capricious, how to generalize across unseen out-of-distribution (OOD) forecasting scenarios using limited training data is of great importance. To bridge this barrier, existing methods focus on discovering domain-generalizable representations across various PDE dynamics trajectories. However, their zero-shot OOD generalization capability remains deficient, since extra test-time samples for domain-specific adaptation are still required. This is because the fundamental physical invariance in PDE dynamical systems are yet to be investigated or integrated. To this end, we first explicitly define a two-fold PDE invariance principle, which points out that ingredient operators and their composition relationships remain invariant across different domains and PDE system evolution. Next, to capture this two-fold PDE invariance, we propose a physics-guided invariant learning method termed iMOOE, featuring an Invariance-aligned Mixture Of Operator Expert architecture and a frequency-enriched invariant learning objective. Extensive experiments across simulated benchmarks and real-world applications validate iMOOE's superior in-distribution performance and zero-shot generalization capabilities on diverse OOD forecasting scenarios.
Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models
Jiang, Yuhua, Huang, Jiawei, Yuan, Yufeng, Mao, Xin, Yue, Yu, Zhao, Qianchuan, Yan, Lin
Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for enhancing Large Language Models (LLMs) on complex reasoning tasks. However, existing methods suffer from an exploration dilemma: the sharply peaked initial policies of pre-trained LLMs confine standard RL algorithms to a narrow set of solutions, boosting single-solution accuracy (pass@1) but suppressing solution diversity and multi-solution performance (pass@k). As a result, RLVR often distills existing capabilities rather than discovering new reasoning strategies. To overcome this, we introduce a Risk-Sensitive Reinforcement Learning framework. Our approach employs a risk-seeking objective that interpolates between mean and maximum rewards, leading to a novel algorithm, Risk-Sensitive GRPO (RS-GRPO), which drives deeper exploration by amplifying learning from challenging prompts. Remarkably, RS-GRPO is simple to implement, requiring only minor code modifications. On six mathematical reasoning benchmarks and with five different LLMs, RS-GRPO consistently improves pass@k performance while maintaining or enhancing pass@1 accuracy.
Experience Deploying Containerized GenAI Services at an HPC Center
Beltre, Angel M., Ogden, Jeff, Pedretti, Kevin
Generative Artificial Intelligence (GenAI) applications are built from specialized components -- inference servers, object storage, vector and graph databases, and user interfaces -- interconnected via web-based APIs. While these components are often containerized and deployed in cloud environments, such capabilities are still emerging at High-Performance Computing (HPC) centers. In this paper, we share our experience deploying GenAI workloads within an established HPC center, discussing the integration of HPC and cloud computing environments. We describe our converged computing architecture that integrates HPC and Kubernetes platforms running containerized GenAI workloads, helping with reproducibility. A case study illustrates the deployment of the Llama Large Language Model (LLM) using a containerized inference server (vLLM) across both Kubernetes and HPC platforms using multiple container runtimes. Our experience highlights practical considerations and opportunities for the HPC container community, guiding future research and tool development.
Advancing Universal Deep Learning for Electronic-Structure Hamiltonian Prediction of Materials
Yin, Shi, Dai, Zujian, Pan, Xinyang, He, Lixin
Deep learning methods for electronic-structure Hamiltonian prediction has offered significant computational efficiency advantages over traditional density functional theory (DFT), yet the diversity of atomic types, structural patterns, and the high-dimensional complexity of Hamiltonians pose substantial challenges to the generalization performance. In this work, we contribute on both the methodology and dataset sides to advance universal deep learning paradigm for Hamiltonian prediction. On the method side, we propose NextHAM, a neural E(3)-symmetry and expressive correction method for efficient and generalizable materials electronic-structure Hamiltonian prediction. First, we introduce the zeroth-step Hamiltoni-ans, which can be efficiently constructed by the initial charge density of DFT, as informative descriptors of neural regression model in the input level and initial estimates of the target Hamiltonian in the output level, so that the regression model directly predicts the correction terms to the target ground truths, thereby significantly simplifying the input-output mapping and facilitating fine-grained predictions. Second, we present a neural Transformer architecture with strict E(3)-Symmetry and high non-linear expressiveness for Hamiltonian prediction. Third, we propose a novel training objective to ensure the accuracy performance of Hamiltonians in both real space and reciprocal space, preventing error amplification and the occurrence of "ghost states" caused by the large condition number of the overlap matrix. Experimental results on Materials-HAM-SOC demonstrate that NextHAM achieves excellent accuracy in predicting Hamiltonians and band structures, with spin-off-diagonal block reaching the accuracy of sub-ยตeV scale. These results establish NextHAM as a universal and highly accurate deep learning model for electronic-structure prediction, delivering DFT -level precision with dramatically improved computational efficiency. Understanding the electronic structure is fundamental to unraveling how electrons govern the properties of condensed matter systems. This knowledge is essential for predicting a wide range of material characteristics, such as electrical conductivity, magnetism, optical behavior, and chemical activity, which are vital for technologies spanning from electronics to sustainable energy and advanced catalysis. At the heart of these calculations is the challenge of determining the system's Hamiltonian matrix, whose eigenvalues and eigenstates yield important quantities like energy levels, band structures, and electronic wavefunctions. Traditionally, Density Functional Theory (DFT) (Hohenberg & Kohn, 1964; Kohn & Sham, 1965) has been the standard approach for these problems. Recently, deep learning has emerged as a powerful tool in the physical sciences (Zhang et al., 2025).
U-Cast: Learning Hierarchical Structures for High-Dimensional Time Series Forecasting
Ni, Juntong, Wang, Shiyu, Liu, Zewen, Shi, Xiaoming, Zhong, Xinyue, Ye, Zhou, Jin, Wei
Time series forecasting (TSF) is a central problem in time series analysis. However, as the number of channels in time series datasets scales to the thousands or more, a scenario we define as High-Dimensional Time Series Forecasting (HDTSF), it introduces significant new modeling challenges that are often not the primary focus of traditional TSF research. HDTSF is challenging because the channel correlation often forms complex and hierarchical patterns. Existing TSF models either ignore these interactions or fail to scale as dimensionality grows. To address this issue, we propose U-Cast, a channel-dependent forecasting architecture that learns latent hierarchical channel structures with an innovative query-based attention. To disentangle highly correlated channel representation, U-Cast adds a full-rank regularization during training. We also release Time-HD, the first benchmark of large, diverse, high-dimensional datasets. Our theory shows that exploiting cross-channel information lowers forecasting risk, and experiments on Time-HD demonstrate that U-Cast surpasses strong baselines in both accuracy and efficiency. Together, U-Cast and Time-HD provide a solid basis for future HDTSF research.
PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration
Pu, Yingming, Lin, Tao, Chen, Hongyu
Large Language Model (LLM)-based multi-agent systems (MAS) demonstrate remarkable potential for scientific discovery. Existing approaches, however, often automate scientific discovery using predefined workflows that lack rationality constraints. This often leads to aimless hypothesizing and a failure to consistently link hypotheses with evidence, thereby hindering the systematic reduction of uncertainty. Overcoming these limitations fundamentally requires a principled approach to exploration. We introduce PiFlow, an information-theoretical framework, treating automated scientific discovery as a structured uncertainty reduction problem guided by principles (e.g., scientific laws). In evaluations across three distinct scientific domains -- discovering nanomaterial structures, bio-molecules, and superconductor candidates with targeted properties -- our method significantly improves discovery efficiency, reflected by a 73.55\% increase in the Area Under the Curve (AUC) of property values versus exploration steps, and enhances solution quality by 94.06\% compared to a vanilla agent system. Overall, PiFlow serves as a Plug-and-Play method, establishing a novel paradigm shift in highly efficient automated scientific discovery, paving the way for more robust and accelerated AI-driven research. Code is publicly available at our \href{https://github.com/amair-lab/PiFlow}{GitHub}.
Is Active Persona Inference Necessary for Aligning Small Models to Personal Preferences?
Tang, Zilu, Akyรผrek, Afra Feyza, Akyรผrek, Ekin, Wijaya, Derry
A prominent issue in aligning language models (LMs) to personalized preferences is underspecification -- the lack of information from users about their preferences. A popular trend of injecting such specification is adding a prefix (e.g. prior relevant conversations) to the current user's conversation to steer preference distribution. Most methods passively model personal preferences with prior example preferences pairs. We ask whether models benefit from actively inferring preference descriptions, and address this question by creating a synthetic personalized alignment dataset based on famous people with known public preferences. We then test how effective finetuned 1-8B size models are at inferring and aligning to personal preferences. Results show that higher-quality active prefixes lead to better generalization, more contextually faithful models, and less systematic biases across different protected attributes. All our results suggest active alignment can lead to a more controllable and efficient path for personalized alignment.
TeraAgent: A Distributed Agent-Based Simulation Engine for Simulating Half a Trillion Agents
Breitwieser, Lukas, Hesam, Ahmad, Yaฤlฤฑkรงฤฑ, Abdullah Giray, Sadrosadati, Mohammad, Rademakers, Fons, Mutlu, Onur
Agent-based simulation is an indispensable paradigm for studying complex systems. These systems can comprise billions of agents, requiring the computing resources of multiple servers to simulate. Unfortunately, the state-of-the-art platform, BioDynaMo, does not scale out across servers due to its shared-memory-based implementation. To overcome this key limitation, we introduce TeraAgent, a distributed agent-based simulation engine. A critical challenge in distributed execution is the exchange of agent information across servers, which we identify as a major performance bottleneck. We propose two solutions: 1) a tailored serialization mechanism that allows agents to be accessed and mutated directly from the receive buffer, and 2) leveraging the iterative nature of agent-based simulations to reduce data transfer with delta encoding. Built on our solutions, TeraAgent enables extreme-scale simulations with half a trillion agents (an 84x improvement), reduces time-to-result with additional compute nodes, improves interoperability with third-party tools, and provides users with more hardware flexibility.
Guide: Generalized-Prior and Data Encoders for DAG Estimation
Roy, Amartya, N, Devharish, Ganguly, Shreya, Ghosh, Kripabandhu
Modern causal discovery methods face critical limitations in scalability, computational efficiency, and adaptability to mixed data types, as evidenced by benchmarks on node scalability (30, $\le 50$, $\ge 70$ nodes), computational energy demands, and continuous/non-continuous data handling. While traditional algorithms like PC, GES, and ICA-LiNGAM struggle with these challenges, exhibiting prohibitive energy costs for higher-order nodes and poor scalability beyond 70 nodes, we propose \textbf{GUIDE}, a framework that integrates Large Language Model (LLM)-generated adjacency matrices with observational data through a dual-encoder architecture. GUIDE uniquely optimizes computational efficiency, reducing runtime on average by $\approx 42%$ compared to RL-BIC and KCRL methods, while achieving an average $\approx 117%$ improvement in accuracy over both NOTEARS and GraN-DAG individually. During training, GUIDE's reinforcement learning agent dynamically balances reward maximization (accuracy) and penalty avoidance (DAG constraints), enabling robust performance across mixed data types and scalability to $\ge 70$ nodes -- a setting where baseline methods fail.
DriveE2E: Closed-Loop Benchmark for End-to-End Autonomous Driving through Real-to-Simulation
Yu, Haibao, Yang, Wenxian, Hao, Ruiyang, Wang, Chuanye, Zhong, Jiaru, Luo, Ping, Nie, Zaiqing
Closed-loop evaluation is increasingly critical for end-to-end autonomous driving. Current closed-loop benchmarks using the CARLA simulator rely on manually configured traffic scenarios, which can diverge from real-world conditions, limiting their ability to reflect actual driving performance. To address these limitations, we introduce a simple yet challenging closed-loop evaluation framework that closely integrates real-world driving scenarios into the CARLA simulator with infrastructure cooperation. Our approach involves extracting 800 dynamic traffic scenarios selected from a comprehensive 100-hour video dataset captured by high-mounted infrastructure sensors, and creating static digital twin assets for 15 real-world intersections with consistent visual appearance. These digital twins accurately replicate the traffic and environmental characteristics of their real-world counterparts, enabling more realistic simulations in CARLA. This evaluation is challenging due to the diversity of driving behaviors, locations, weather conditions, and times of day at complex urban intersections. In addition, we provide a comprehensive closed-loop benchmark for evaluating end-to-end autonomous driving models. Red circle denotes the selected ego vehicle. End-to-End Autonomous Driving (E2EAD) has shown great advances and potential. Effective evaluation is essential for assessing the driving capabilities of E2EAD models, thereby advancing research and promoting the development of improved algorithms. Traditionally, E2EAD performance has been assessed using open-loop evaluation, which operates on prerecorded expert driving trajectories and corresponding sensor data, as seen in datasets such as nuScenes Caesar et al. (2020). In this setting, the model passively predicts actions without influencing future observations, making the task resemble trajectory prediction Zhai et al. (2023); Li et al. (2024b). As a result, open-loop evaluation provides limited insight into vehicle-environment interactions and real-time decision-making. In contrast, closed-loop evaluation continuously updates observations based on the ego vehicle's actions, allowing the E2EAD model to control the vehicle using its own decisions.