Energy
Energy Use of AI Inference: Efficiency Pathways and Test-Time Compute
Oviedo, Felipe, Kazhamiaka, Fiodar, Choukse, Esha, Kim, Allen, Luers, Amy, Nakagawa, Melanie, Bianchini, Ricardo, Ferres, Juan M. Lavista
As AI inference scales to billions of queries and emerging reasoning and agentic workflows increase token demand, reliable estimates of per-query energy use are increasingly important for capacity planning, emissions accounting, and efficiency prioritization. Many public estimates are inconsistent and overstate energy use, because they extrapolate from limited benchmarks and fail to reflect efficiency gains achievable at scale. In this perspective, we introduce a bottom-up methodology to estimate the per-query energy of large-scale LLM systems based on token throughput. For models running on an H100 node under realistic workloads, GPU utilization and PUE constraints, we estimate a median energy per query of 0.34 Wh (IQR: 0.18-0.67) for frontier-scale models (>200 billion parameters). These results are consistent with measurements using production-scale configurations and show that non-production estimates and assumptions can overstate energy use by 4-20x. Extending to test-time scaling scenarios with 15x more tokens per typical query, the median energy rises 13x to 4.32 Wh, indicating that targeting efficiency in this regime will deliver the largest fleet-wide savings. We quantify achievable efficiency gains at the model, serving platform, and hardware levels, finding individual median reductions of 1.5-3.5x in energy per query, while combined advances can plausibly deliver 8-20x reductions. To illustrate the system-level impact, we estimate the baseline daily energy use of a deployment serving 1 billion queries to be 0.8 GWh/day. If 10% are long queries, demand could grow to 1.8 GWh/day. With targeted efficiency interventions, it falls to 0.9 GWh/day, similar to the energy footprint of web search at that scale. This echoes how data centers historically tempered energy growth through efficiency gains during the internet and cloud build-up.
Design Insights and Comparative Evaluation of a Hardware-Based Cooperative Perception Architecture for Lane Change Prediction
Manzour, Mohamed, Elias, Catherine M., Shehata, Omar M., Izquierdo, Rubรฉn, Sotelo, Miguel รngel
Traffic accidents remain a major global concern, with lane-change maneuvers recognized as one of the significant contributors to collision risk. Anticipating these maneuvers has become an important research focus, supporting both traffic safety and the safe integration of autonomous and assisted driving technologies. Over the past decade, numerous models have been developed for lane-change prediction. However, most existing works have been designed and validated using simulation environments or pre-recorded datasets. While these settings allow for benchmarking and controlled evaluation, they often rely on simplified assumptions about sensing, communication, and vehicle behavior that do not fully capture the complexity of real-world operation. Real-world deployments of lane-change prediction systems are relatively rare, and when they are reported, their practical challenges, limitations, and insights remain under-documented. To illustrate the setting more concretely, consider the left lane change scenario shown in Figure 1. The Ego Vehicle (EV) is driving in the left lane, while the Target Vehicle (TV) is moving in the right lane behind a Preceding Vehicle (PV). When the PV suddenly brakes, the TV must change lanes to avoid a collision.
TopoCut: Learning Multi-Step Cutting with Spectral Rewards and Discrete Diffusion Policies
Wang, Liquan, Bian, Jiangjie, Heiden, Eric, Garg, Animesh
Robotic manipulation tasks involving cutting deformable objects remain challenging due to complex topological behaviors, difficulties in perceiving dense object states, and the lack of efficient evaluation methods for cutting outcomes. In this paper, we introduce TopoCut, a comprehensive benchmark for multi-step robotic cutting tasks that integrates a cutting environment and generalized policy learning. TopoCut is built upon three core components: (1) We introduce a high-fidelity simulation environment based on a particle-based elastoplastic solver with compliant von Mises constitutive models, augmented by a novel damage-driven topology discovery mechanism that enables accurate tracking of multiple cutting pieces. (2) We develop a comprehensive reward design that integrates the topology discovery with a pose-invariant spectral reward model based on Laplace-Beltrami eigenanalysis, facilitating consistent and robust assessment of cutting quality. (3) We propose an integrated policy learning pipeline, where a dynamics-informed perception module predicts topological evolution and produces particle-wise, topology-aware embeddings to support PDDP (Particle-based Score-Entropy Discrete Diffusion Policy) for goal-conditioned policy learning. Extensive experiments demonstrate that TopoCut supports trajectory generation, scalable learning, precise evaluation, and strong generalization across diverse object geometries, scales, poses, and cutting goals.
FedRAIN-Lite: Federated Reinforcement Algorithms for Improving Idealised Numerical Weather and Climate Models
Nath, Pritthijit, Schemm, Sebastian, Moss, Henry, Haynes, Peter, Shuckburgh, Emily, Webb, Mark
Sub-grid parameterisations in climate models are traditionally static and tuned offline, limiting adaptability to evolving states. This work introduces FedRAIN-Lite, a federated reinforcement learning (FedRL) framework that mirrors the spatial decomposition used in general circulation models (GCMs) by assigning agents to latitude bands, enabling local parameter learning with periodic global aggregation. Using a hierarchy of simplified energy-balance climate models, from a single-agent baseline (ebm-v1) to multi-agent ensemble (ebm-v2) and GCM-like (ebm-v3) setups, we benchmark three RL algorithms under different FedRL configurations. Results show that Deep Deterministic Policy Gradient (DDPG) consistently outperforms both static and single-agent baselines, with faster convergence and lower area-weighted RMSE in tropical and mid-latitude zones across both ebm-v2 and ebm-v3 setups. DDPG's ability to transfer across hyperparameters and low computational cost make it well-suited for geographically adaptive parameter learning. This capability offers a scalable pathway towards high-complexity GCMs and provides a prototype for physically aligned, online-learning climate models that can evolve with a changing climate. Code accessible at https://github.com/p3jitnath/climate-rl-fedrl.
SciRerankBench: Benchmarking Rerankers Towards Scientific Retrieval-Augmented Generated LLMs
Chen, Haotian, Long, Qingqing, Xiao, Meng, Luo, Xiao, Ju, Wei, Wang, Chengrui, Wang, Xuezhi, Zhou, Yuanchun, Zhu, Hengshu
Scientific literature question answering is a pivotal step towards new scientific discoveries. Recently, \textit{two-stage} retrieval-augmented generated large language models (RAG-LLMs) have shown impressive advancements in this domain. Such a two-stage framework, especially the second stage (reranker), is particularly essential in the scientific domain, where subtle differences in terminology may have a greatly negative impact on the final factual-oriented or knowledge-intensive answers. Despite this significant progress, the potential and limitations of these works remain unexplored. In this work, we present a Scientific Rerank-oriented RAG Benchmark (SciRerankBench), for evaluating rerankers within RAG-LLMs systems, spanning five scientific subjects. To rigorously assess the reranker performance in terms of noise resilience, relevance disambiguation, and factual consistency, we develop three types of question-context-answer (Q-C-A) pairs, i.e., Noisy Contexts (NC), Semantically Similar but Logically Irrelevant Contexts (SSLI), and Counterfactual Contexts (CC). Through systematic evaluation of 13 widely used rerankers on five families of LLMs, we provide detailed insights into their relative strengths and limitations. To the best of our knowledge, SciRerankBench is the first benchmark specifically developed to evaluate rerankers within RAG-LLMs, which provides valuable observations and guidance for their future development.
Coverage Path Planning for Holonomic UAVs via Uniaxial-Feasible, Gap-Severity Guided Decomposition
Granadeno, Pedro Antonio Alarcon, Cleland-Huang, Jane
Abstract-- Modern coverage path planning (CPP) for holo-nomic UA Vs in emergency response must contend with diverse environments where regions of interest (ROIs) often take the form of highly irregular polygons, characterized by asymmetric shapes, dense clusters of concavities, and multiple internal holes. Modern CPP pipelines typically rely on decomposition strategies that overfragment such polygons into numerous subregions. This increases the number of sweep segments and connectors, which in turn adds inter-region travel and forces more frequent reorientation. These effects ultimately result in longer completion times and degraded trajectory quality. We address this with a decomposition strategy that applies a recursive dual-axis monotonicity criterion with cuts guided by a cumulative gap severity metric. This approach distributes clusters of concavities more evenly across subregions and produces a minimal set of partitions that remain sweepable under a parallel-track maneuver . We pair this with a global optimizer that jointly selects sweep paths and inter-partition transitions to minimize total path length, transition overhead, and turn count. We demonstrate that our proposed approach achieves the lowest mean overhead in path length and completion time across 13 notable CPP pipelines. Coverage Path Planning (CPP) generates trajectories that fully observe a Region of Interest (ROI) with minimal cost, typically measured in path length, turns, execution time, or energy use. A dominant strategy in aerial and ground robotics is parallel-track coverage, where back-and-forth sweeps (e.g., boustrophedon) tile the region. For convex ROIs, selecting an optimal sweep direction is straightforward.
Hybrid Safety Verification of Multi-Agent Systems using $ฯ$-Weighted CBFs and PAC Guarantees
Margapuri, Venkat, Kazanjian, Garik, Kosaraju, Naren
Abstract--This study proposes a hybrid safety verification framework for closed-loop multi-agent systems under bounded stochastic disturbances. The proposed approach augments control barrier functions with a novel ฯ-weighted formulation that encodes directional control alignment between agents into the safety constraints. Deterministic admissibility is combined with empirical validation via Monte Carlo rollouts, and a PAC-style guarantee is derived based on margin-aware safety violations to provide a probabilistic safety certificate. The results from the experiments conducted under different bounded stochastic disturbances validate the feasibility of the proposed approach. Safety within multi-agent systems is essential for real-world applications such as autonomous driving [1], [2] and robotic swarm deployments in agriculture [3], [4], manufacturing [5], [6], and search and rescue operations [7], where agents must navigate safely through their environment.
Lidar-based Tracking of Traffic Participants with Sensor Nodes in Existing Urban Infrastructure
Schรคfer, Simon, Alrifaee, Bassam, Hashemi, Ehsan
This paper presents a lidar-only state estimation and tracking framework, along with a roadside sensing unit for integration with existing urban infrastructure. Urban deployments demand scalable, real-time tracking solutions, yet traditional remote sensing remains costly and computationally intensive, especially under perceptually degraded conditions. Our sensor node couples a single lidar with an edge computing unit and runs a computationally efficient, GPU-free observer that simultaneously estimates object state, class, dimensions, and existence probability. The pipeline performs: (i) state updates via an extended Kalman filter, (ii) dimension estimation using a 1D grid-map/Bayesian update, (iii) class updates via a lookup table driven by the most probable footprint, and (iv) existence estimation from track age and bounding-box consistency. Experiments in dynamic urban-like scenes with diverse traffic participants demonstrate real-time performance and high precision: The complete end-to-end pipeline finishes within \SI{100}{\milli\second} for \SI{99.88}{\%} of messages, with an excellent detection rate. Robustness is further confirmed under simulated wind and sensor vibration. These results indicate that reliable, real-time roadside tracking is feasible on CPU-only edge hardware, enabling scalable, privacy-friendly deployments within existing city infrastructure. The framework integrates with existing poles, traffic lights, and buildings, reducing deployment costs and simplifying large-scale urban rollouts and maintenance efforts.
Choosing to Be Green: Advancing Green AI via Dynamic Model Selection
Cruciani, Emilio, Verdecchia, Roberto
Artificial Intelligence is increasingly pervasive across domains, with ever more complex models delivering impressive predictive performance. This fast technological advancement however comes at a concerning environmental cost, with state-of-the-art models - particularly deep neural networks and large language models - requiring substantial computational resources and energy. In this work, we present the intuition of Green AI dynamic model selection, an approach based on dynamic model selection that aims at reducing the environmental footprint of AI by selecting the most sustainable model while minimizing potential accuracy loss. Specifically, our approach takes into account the inference task, the environmental sustainability of available models, and accuracy requirements to dynamically choose the most suitable model. Our approach presents two different methods, namely Green AI dynamic model cascading and Green AI dynamic model routing. We demonstrate the effectiveness of our approach via a proof of concept empirical example based on a real-world dataset. Our results show that Green AI dynamic model selection can achieve substantial energy savings (up to ~25%) while substantially retaining the accuracy of the most energy greedy solution (up to ~95%). As conclusion, our preliminary findings highlight the potential that hybrid, adaptive model selection strategies withhold to mitigate the energy demands of modern AI systems without significantly compromising accuracy requirements.
From Samples to Scenarios: A New Paradigm for Probabilistic Forecasting
Dai, Xilin, Xu, Zhijian, Cai, Wanxu, Xu, Qiang
Most state-of-the-art probabilistic time series forecasting models rely on sampling to represent future uncertainty. However, this paradigm suffers from inherent limitations, such as lacking explicit probabilities, inadequate coverage, and high computational costs. In this work, we introduce \textbf{Probabilistic Scenarios}, an alternative paradigm designed to address the limitations of sampling. It operates by directly producing a finite set of \{Scenario, Probability\} pairs, thus avoiding Monte Carlo-like approximation. To validate this paradigm, we propose \textbf{TimePrism}, a simple model composed of only three parallel linear layers. Surprisingly, TimePrism achieves 9 out of 10 state-of-the-art results across five benchmark datasets on two metrics. The effectiveness of our paradigm comes from a fundamental reframing of the learning objective. Instead of modeling an entire continuous probability space, the model learns to represent a set of plausible scenarios and corresponding probabilities. Our work demonstrates the potential of the Probabilistic Scenarios paradigm, opening a promising research direction in forecasting beyond sampling.