Energy
FairBatching: Fairness-Aware Batch Formation for LLM Inference
Lyu, Hongtao, Liu, Boyue, Wu, Mingyu, Chen, Haibo
Large language model (LLM) inference systems face a fundamental tension between minimizing Time-to-First-Token (TTFT) latency for new requests and maintaining a high, steady token generation rate (low Time-Per-Output-Token, or TPOT) for ongoing requests. Existing stall-free batching schedulers proposed by Sarathi, while effective at preventing decode stalls, introduce significant computational unfairness. They prioritize decode tasks excessively, simultaneously leading to underutilized decode slack and unnecessary prefill queuing delays, which collectively degrade the system's overall quality of service (QoS). This work identifies the root cause of this unfairness: the non-monotonic nature of Time-Between-Tokens (TBT) as a scheduling metric and the rigid decode-prioritizing policy that fails to adapt to dynamic workload bursts. We therefore propose FairBatching, a novel LLM inference scheduler that enforces fair resource allocation between prefill and decode tasks. It features an adaptive batch capacity determination mechanism, which dynamically adjusts the computational budget to improve the GPU utilization without triggering SLO violations. Its fair and dynamic batch formation algorithm breaks away from the decode-prioritizing paradigm, allowing computation resources to be reclaimed from bursting decode tasks to serve prefill surges, achieving global fairness. Furthermore, FairBatching provides a novel load estimation method, enabling more effective coordination with upper-level schedulers. Implemented and evaluated on realistic traces, FairBatching significantly reduces TTFT tail latency by up to 2.29x while robustly maintaining TPOT SLOs, achieving overall 20.0% improvement in single-node capacity and 54.3% improvement in cluster-level capacity.
PluriHop: Exhaustive, Recall-Sensitive QA over Distractor-Rich Corpora
Sveistrys, Mykolas, Kunert, Richard
Recent advances in large language models (LLMs) and retrieval-augmented generation (RAG) have enabled progress on question answering (QA) when relevant evidence is in one (single-hop) or multiple (multi-hop) passages. Yet many realistic questions about recurring report data - medical records, compliance filings, maintenance logs - require aggregation across all documents, with no clear stopping point for retrieval and high sensitivity to even one missed passage. We term these pluri-hop questions and formalize them by three criteria: recall sensitivity, exhaustiveness, and exactness. To study this setting, we introduce PluriHopWIND, a diagnostic multilingual dataset of 48 pluri-hop questions built from 191 real-world wind industry reports in German and English. We show that PluriHopWIND is 8-40% more repetitive than other common datasets and thus has higher density of distractor documents, better reflecting practical challenges of recurring report corpora. We test a traditional RAG pipeline as well as graph-based and multimodal variants, and find that none of the tested approaches exceed 40% in statement-wise F1 score. Motivated by this, we propose PluriHopRAG, a RAG architecture that follows a "check all documents individually, filter cheaply" approach: it (i) decomposes queries into document-level subquestions and (ii) uses a cross-encoder filter to discard irrelevant documents before costly LLM reasoning. We find that PluriHopRAG achieves relative F1 score improvements of 18-52% depending on base LLM. Despite its modest size, PluriHopWIND exposes the limitations of current QA systems on repetitive, distractor-rich corpora. PluriHopRAG's performance highlights the value of exhaustive retrieval and early filtering as a powerful alternative to top-k methods.
Learning Human-Humanoid Coordination for Collaborative Object Carrying
Du, Yushi, Li, Yixuan, Jia, Baoxiong, Lin, Yutang, Zhou, Pei, Liang, Wei, Yang, Yanchao, Huang, Siyuan
Human-humanoid collaboration shows significant promise for applications in healthcare, domestic assistance, and manufacturing. While compliant robot-human collaboration has been extensively developed for robotic arms, enabling compliant human-humanoid collaboration remains largely unexplored due to humanoids' complex whole-body dynamics. In this paper, we propose a proprioception-only reinforcement learning approach, COLA, that combines leader and follower behaviors within a single policy. The model is trained in a closed-loop environment with dynamic object interactions to predict object motion patterns and human intentions implicitly, enabling compliant collaboration to maintain load balance through coordinated trajectory planning. We evaluate our approach through comprehensive simulator and real-world experiments on collaborative carrying tasks, demonstrating the effectiveness, generalization, and robustness of our model across various terrains and objects. Simulation experiments demonstrate that our model reduces human effort by 24.7%. compared to baseline approaches while maintaining object stability. Real-world experiments validate robust collaborative carrying across different object types (boxes, desks, stretchers, etc.) and movement patterns (straight-line, turning, slope climbing). Human user studies with 23 participants confirm an average improvement of 27.4% compared to baseline models. Our method enables compliant human-humanoid collaborative carrying without requiring external sensors or complex interaction models, offering a practical solution for real-world deployment.
Spatial Computing Communications for Multi-User Virtual Reality in Distributed Mobile Edge Computing Network
Xu, Caolu, Chen, Zhiyong, Tao, Meixia, Song, Li, Zhang, Wenjun
Abstract--Immersive virtual reality (VR) applications impose stringent requirements on latency, energy efficiency, and computational resources, particularly in multi-user interactive scenarios. T o address these challenges, we introduce the concept of spatial computing communications (SCC), a framework designed to meet the latency and energy demands of multi-user VR over distributed mobile edge computing (MEC) networks. SCC jointly represents the physical space, defined by users and base stations, and the virtual space, representing shared immersive environments, using a probabilistic model of user dynamics and resource requirements. The resource deployment task is then formulated as a multi-objective combinatorial optimization (MOCO) problem that simultaneously minimizes system latency and energy consumption across distributed MEC resources. T o solve this problem, we propose MO-CMPO, a multi-objective consistency model with policy optimization that integrates supervised learning and reinforcement learning (RL) fine-tuning guided by preference weights. Leveraging a sparse graph neural network (GNN), MO-CMPO efficiently generates Pareto-optimal solutions. Simulations with real-world New Radio base station datasets demonstrate that MO-CMPO achieves superior hypervolume performance and significantly lower inference latency than baseline methods. Furthermore, the analysis reveals practical deployment patterns: latency-oriented solutions favor local MEC execution to reduce transmission delay, while energy-oriented solutions minimize redundant placements to save energy.
Prescribed Performance Control of Deformable Object Manipulation in Spatial Latent Space
Han, Ning, Gong, Gu, Zhang, Bin, Xu, Yuexuan, Yang, Bohan, Liu, Yunhui, Navarro-Alarcon, David
Manipulating three-dimensional (3D) deformable objects presents significant challenges for robotic systems due to their infinite-dimensional state space and complex deformable dynamics. This paper proposes a novel model-free approach for shape control with constraints imposed on key points. Unlike existing methods that rely on feature dimensionality reduction, the proposed controller leverages the coordinates of key points as the feature vector, which are extracted from the deformable object's point cloud using deep learning methods. This approach not only reduces the dimensionality of the feature space but also retains the spatial information of the object. By extracting key points, the manipulation of deformable objects is simplified into a visual servoing problem, where the shape dynamics are described using a deformation Jacobian matrix. To enhance control accuracy, a prescribed performance control method is developed by integrating barrier Lyapunov functions (BLF) to enforce constraints on the key points. The stability of the closed-loop system is rigorously analyzed and verified using the Lyapunov method. Experimental results further demonstrate the effectiveness and robustness of the proposed method.
Cyber-Resilient System Identification for Power Grid through Bayesian Integration
Li, Shimiao, Qu, Guannan, Hooi, Bryan, Sekar, Vyas, Kar, Soummya, Pileggi, Larry
Power grids increasingly need real-time situational awareness under the ever-evolving cyberthreat landscape. Advances in snapshot-based system identification approaches have enabled accurately estimating states and topology from a snapshot of measurement data, under random bad data and topology errors. However, modern interactive, targeted false data can stay undetectable to these methods, and significantly compromise estimation accuracy. This work advances system identification that combines snapshot-based method with time-series model via Bayesian Integration, to advance cyber resiliency against both random and targeted false data. Using a distance-based time-series model, this work can leverage historical data of different distributions induced by changes in grid topology and other settings. The normal system behavior captured from historical data is integrated into system identification through a Bayesian treatment, to make solutions robust to targeted false data. We experiment on mixed random anomalies (bad data, topology error) and targeted false data injection attack (FDIA) to demonstrate our method's 1) cyber resilience: achieving over 70% reduction in estimation error under FDIA; 2) anomalous data identification: being able to alarm and locate anomalous data; 3) almost linear scalability: achieving comparable speed with the snapshot-based baseline, both taking <1min per time tick on the large 2,383-bus system using a laptop CPU.
Efficient Few-Shot Learning in Remote Sensing: Fusing Vision and Vision-Language Models
Chua, Jia Yun, Zolotas, Argyrios, Arana-Catania, Miguel
Remote sensing has become a vital tool across sectors such as urban planning, environmental monitoring, and disaster response. While the volume of data generated has increased significantly, traditional vision models are often constrained by the requirement for extensive domain-specific labelled data and their limited ability to understand the context within complex environments. Vision Language Models offer a complementary approach by integrating visual and textual data; however, their application to remote sensing remains underexplored, particularly given their generalist nature. This work investigates the combination of vision models and VLMs to enhance image analysis in remote sensing, with a focus on aircraft detection and scene understanding. The integration of YOLO with VLMs such as LLaVA, ChatGPT, and Gemini aims to achieve more accurate and contextually aware image interpretation. Performance is evaluated on both labelled and unlabelled remote sensing data, as well as degraded image scenarios which are crucial for remote sensing. The findings show an average MAE improvement of 48.46% across models in the accuracy of aircraft detection and counting, especially in challenging conditions, in both raw and degraded scenarios. A 6.17% improvement in CLIPScore for comprehensive understanding of remote sensing images is obtained. The proposed approach combining traditional vision models and VLMs paves the way for more advanced and efficient remote sensing image analysis, especially in few-shot learning scenarios.
TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness
Zhao, Zhiyuan, Ni, Juntong, Xu, Shangqing, Liu, Haoxin, Jin, Wei, Prakash, B. Aditya
Time-series forecasting is an essential task with wide real-world applications across domains. While recent advances in deep learning have enabled time-series forecasting models with accurate predictions, there remains considerable debate over which architectures and design components, such as series decomposition or normalization, are most effective under varying conditions. Existing benchmarks primarily evaluate models at a high level, offering limited insight into why certain designs work better. To mitigate this gap, we propose TimeRecipe, a unified benchmarking framework that systematically evaluates time-series forecasting methods at the module level. TimeRecipe conducts over 10,000 experiments to assess the effectiveness of individual components across a diverse range of datasets, forecasting horizons, and task settings. Our results reveal that exhaustive exploration of the design space can yield models that outperform existing state-of-the-art methods and uncover meaningful intuitions linking specific design choices to forecasting scenarios. Furthermore, we release a practical toolkit within TimeRecipe that recommends suitable model architectures based on these empirical insights. The benchmark is available at: https://github.com/AdityaLab/TimeRecipe.
There's a simple way we could drastically cut AI energy use
There's a simple way we could drastically cut AI energy use Being more judicious in which AI models we use for tasks could potentially save 31.9 terawatt-hours of energy this year alone - equivalent to the output of five nuclear reactors. Tiago da Silva Barros at the University of Cote d'Azur in France and his colleagues looked at 14 different tasks that people use generative AI tools for, ranging from text generation to speech recognition and image classification. 'Flashes of brilliance and frustration': I let an AI agent run my day They then examined public leaderboards, including those hosted by the machine learning hub Hugging Face, for how different models perform. The energy efficiency of the models during inference - when an AI model produces an answer - was measured by a tool called CarbonTracker, and the total energy use of that model was calculated by tracking user downloads. "Based on the size of the model, we estimated the energy consumption, and based on this, we can try to do our estimations," says da Silva Barros.
How ByteDance Made China's Most Popular AI Chatbot
ByteDance's Doubao app has overtaken DeepSeek, proving that user-friendly design often matters more than having the most advanced AI model. When Chinese AI startup DeepSeek became a global sensation in January, it not only shocked Silicon Valley but also startled ByteDance, TikTok's parent company. The Chinese tech giant had already launched Doubao, its own flagship AI assistant app with tens of millions of users. But when DeepSeek became the best-known Chinese AI company overnight, no one was talking about Doubao anymore. Now, ByteDance has gotten its revenge.