Energy
One Ring to Rule Them All: Constrained Distributional Control for Massive-Scale Heterogeneous Robotic Ensemble Systems
Arias, Andres, Zhang, Wei, Qian, Haoyu, Li, Jr-Shin, Sun, Chuangchuang
Ensemble control aims to steer a population of dynamical systems using a shared control input. This paper introduces a constrained ensemble control framework for parameterized, heterogeneous robotic systems operating under state and environmental constraints, such as obstacle avoidance. We develop a moment kernel transform that maps the parameterized ensemble dynamics to the moment system in a kernel space, enabling the characterization of population-level behavior. The state-space constraints, such as polyhedral waypoints to be visited and obstacles to be avoided, are also transformed into the moment space, leading to a unified formulation for safe, large-scale ensemble control. Expressive signal temporal logic specifications are employed to encode complex visit-avoid tasks, which are achieved through a single shared controller synthesized from our constrained ensemble control formulation. Simulation and hardware experiments demonstrate the effectiveness of the proposed approach in safely and efficiently controlling robotic ensembles within constrained environments.
Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration
Ghose, Debasmita, Gitelson, Oz, Vazquez, Marynel, Scassellati, Brian
To collaborate with humans, robots must infer goals that are often ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action-Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects supportive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes.
Dual-Stream Spectral Decoupling Distillation for Remote Sensing Object Detection
Gao, Xiangyi, Zhao, Danpei, Yuan, Bo, Li, Wentao
Knowledge distillation is an effective and hardware-friendly method, which plays a key role in lightweighting remote sensing object detection. However, existing distillation methods often encounter the issue of mixed features in remote sensing images (RSIs), and neglect the discrepancies caused by subtle feature variations, leading to entangled knowledge confusion. To address these challenges, we propose an architecture-agnostic distillation method named Dual-Stream Spectral Decoupling Distillation (DS2D2) for universal remote sensing object detection tasks. Specifically, DS2D2 integrates explicit and implicit distillation grounded in spectral decomposition. Firstly, the first-order wavelet transform is applied for spectral decomposition to preserve the critical spatial characteristics of RSIs. Leveraging this spatial preservation, a Density-Independent Scale Weight (DISW) is designed to address the challenges of dense and small object detection common in RSIs. Secondly, we show implicit knowledge hidden in subtle student-teacher feature discrepancies, which significantly influence predictions when activated by detection heads. This implicit knowledge is extracted via full-frequency and high-frequency amplifiers, which map feature differences to prediction deviations. Extensive experiments on DIOR and DOTA datasets validate the effectiveness of the proposed method. Specifically, on DIOR dataset, DS2D2 achieves improvements of 4.2% in AP50 for RetinaNet and 3.8% in AP50 for Faster R-CNN, outperforming existing distillation approaches. The source code will be available at https://github.com/PolarAid/DS2D2.
Development of a 15-Degree-of-Freedom Bionic Hand with Cable-Driven Transmission and Distributed Actuation
Han, Haoqi, Yang, Yi, Yu, Yifei, Zhou, Yixuan, Zhu, Xiaohan, Wang, Hesheng
Abstract--In robotic hand research, minimizing the number of actuators while maintaining human-hand-consistent dimensions and degrees of freedom constitutes a fundamental challenge. Drawing bio-inspiration from human hand kinematic configurations and muscle distribution strategies, this work proposes a novel 15-DoF dexterous robotic hand, with detailed analysis of its mechanical architecture, electrical system, and control system. The bionic hand employs a new tendon-driven mechanism, significantly reducing the number of motors required by traditional tendon-driven systems while enhancing motion performance and simplifying the mechanical structure. This design integrates five motors in the forearm to provide strong gripping force, while ten small motors are installed in the palm to support fine manipulation tasks. Additionally, a corresponding joint sensing and motor driving electrical system was developed to ensure efficient control and feedback. The entire system weighs only 1.4kg, combining lightweight and high-performance features. Through experiments, the bionic hand exhibited exceptional dexterity and robust grasping capabilities, demonstrating significant potential for robotic manipulation tasks. HE development of actuator systems with human-level dexterity presents significant challenges [1], [2], stemming from the bio-integrated nature of the human hand: it is not an isolated entity but a highly coupled system intricately connected through skeletal-muscular-neural networks to the forearm, forming a synergistic functional unit.
From FLOPs to Footprints: The Resource Cost of Artificial Intelligence
Falk, Sophia, Corrêa, Nicholas Kluge, Luccioni, Sasha, Biber-Freudenberger, Lisa, van Wynsberghe, Aimee
As computational demands continue to rise, assessing the environmental footprint of AI requires moving beyond energy and water consumption to include the material demands of specialized hardware. This study quantifies the material footprint of AI training by linking computational workloads to physical hardware needs. The elemental composition of the Nvidia A100 SXM 40 GB graphics processing unit (GPU) was analyzed using inductively coupled plasma optical emission spectroscopy, which identified 32 elements. The results show that AI hardware consists of about 90% heavy metals and only trace amounts of precious metals. The elements copper, iron, tin, silicon, and nickel dominate the GPU composition by mass. In a multi-step methodology, we integrate these measurements with computational throughput per GPU across varying lifespans, accounting for the computational requirements of training specific AI models at different training efficiency regimes. Scenario-based analyses reveal that, depending on Model FLOPs Utilization (MFU) and hardware lifespan, training GPT-4 requires between 1,174 and 8,800 A100 GPUs, corresponding to the extraction and eventual disposal of up to 7 tons of toxic elements. Combined software and hardware optimization strategies can reduce material demands: increasing MFU from 20% to 60% lowers GPU requirements by 67%, while extending lifespan from 1 to 3 years yields comparable savings; implementing both measures together reduces GPU needs by up to 93%. Our findings highlight that incremental performance gains, such as those observed between GPT-3.5 and GPT-4, come at disproportionately high material costs. The study underscores the necessity of incorporating material resource considerations into discussions of AI scalability, emphasizing that future progress in AI must align with principles of resource efficiency and environmental responsibility.
Decoding Large Language Diffusion Models with Foreseeing Movement
Mo, Yichuan, Chen, Quan, Li, Mingjie, Wei, Zeming, Wang, Yisen
Large Language Diffusion Models (LLDMs) benefit from a flexible decoding mechanism that enables parallelized inference and controllable generations over autoregressive models. Yet such flexibility introduces a critical challenge: inference performance becomes highly sensitive to the decoding order of tokens. Existing heuristic methods, however, focus mainly on local effects while overlooking long-term impacts. To address this limitation, we propose the Foreseeing Decoding Method (FDM), a novel approach that integrates both local and global considerations to unlock the full potential, employing a search-based strategy to enable effective optimization in discrete spaces. Furthermore, by analyzing the consistency of chosen tokens in the full decoding process, we develop a variant, FDM with Acceleration (FDM-A), which restricts deep exploration to critical steps identified as the exploration and balance circumantences. Extensive experiments across diverse benchmarks and model architectures validate the scalability of FDM and demonstrate the superior efficiency-performance trade-off achieved by FDM-A. Our work might potentially provide a principled step toward more powerful decoding methods for LLDMs.
The changing surface of the world's roads
Randhawa, Sukanya, Randhawa, Guntaj, Langer, Clemens, Andorful, Francis, Herfort, Benjamin, Kwakye, Daniel, Olchik, Omer, Lautenbach, Sven, Zipf, Alexander
Resilient road infrastructure is a cornerstone of the UN Sustainable Development Goals. Yet a primary indicator of network functionality and resilience is critically lacking: a comprehensive global baseline of road surface information. Here, we overcome this gap by applying a deep learning framework to a global mosaic of Planetscope satellite imagery from 2020 and 2024. The result is the first global multi-temporal dataset of road pavedness and width for 9.2 million km of critical arterial roads, achieving 95.5% coverage where nearly half the network was previously unclassified. This dataset reveals a powerful multi-scale geography of human development. At the planetary scale, we show that the rate of change in pavedness is a robust proxy for a country's development trajectory (correlation with HDI = 0.65). At the national scale, we quantify how unpaved roads constitute a fragile backbone for economic connectivity. We further synthesize our data into a global Humanitarian Passability Matrix with direct implications for humanitarian logistics. At the local scale, case studies demonstrate the framework's versatility: in Ghana, road quality disparities expose the spatial outcomes of governance; in Pakistan, the data identifies infrastructure vulnerabilities to inform climate resilience planning. Together, this work delivers both a foundational dataset and a multi-scale analytical framework for monitoring global infrastructure, from the dynamics of national development to the realities of local governance, climate adaptation, and equity. Unlike traditional proxies such as nighttime lights, which reflect economic activity, road surface data directly measures the physical infrastructure that underpins prosperity and resilience - at higher spatial resolution.
A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models
In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minimize the number of idle experts, which is important for the efficient utilization of (costly) GPUs. We provide a theoretical framework for analyzing the Auxiliary-Loss-Free Load Balancing (ALF-LB) procedure -- proposed by DeepSeek's Wang et al. (2024) -- by casting it as a one-step-per-iteration primal-dual method for an assignment problem. First, in a stylized deterministic setting, our framework yields several insightful structural properties: (i) a monotonic improvement of a Lagrangian objective, (ii) a preference rule that moves tokens from overloaded to underloaded experts, and (iii) an approximate-balancing guarantee. Then, we incorporate the stochastic and dynamic nature of AI training using a generalized online optimization formulation. In the online setting, we derive a strong convexity property of the objective that leads to a logarithmic expected regret bound under certain step-size choices. Additionally, we present real experiments on 1B-parameter DeepSeekMoE models to complement our theoretical findings. Together, these results build a principled framework for analyzing the Auxiliary-Loss-Free Load Balancing of s-MoE in AI models.
EVER: Edge-Assisted Auto-Verification for Mobile MR-Aided Operation
Chen, Jiangong, Zhu, Mingyu, Li, Bin
Mixed Reality (MR)-aided operation overlays digital objects on the physical world to provide a more immersive and intuitive operation process. A primary challenge is the precise and fast auto-verification of whether the user follows MR guidance by comparing frames before and after each operation. The pre-operation frame includes virtual guiding objects, while the post-operation frame contains physical counterparts. Existing approaches fall short of accounting for the discrepancies between physical and virtual objects due to imperfect 3D modeling or lighting estimation. In this paper, we propose EVER: an edge-assisted auto-verification system for mobile MR-aided operations. Unlike traditional frame-based similarity comparisons, EVER leverages the segmentation model and rendering pipeline adapted to the unique attributes of frames with physical pieces and those with their virtual counterparts; it adopts a threshold-based strategy using Intersection over Union (IoU) metrics for accurate auto-verification. To ensure fast auto-verification and low energy consumption, EVER offloads compute-intensive tasks to an edge server. Through comprehensive evaluations of public datasets and custom datasets with practical implementation, EVER achieves over 90% verification accuracy within 100 milliseconds (significantly faster than average human reaction time of approximately 273 milliseconds), while consuming only minimal additional computational resources and energy compared to a system without auto-verification.
Locality-Sensitive Hashing-Based Efficient Point Transformer for Charged Particle Reconstruction
Govil, Shitij, Rodgers, Jack P., Chou, Yuan-Tang, Miao, Siqi, Saha, Amit, Anand, Advaith, Lieret, Kilian, DeZoort, Gage, Liu, Mia, Duarte, Javier, Li, Pan, Hsu, Shih-Chieh
Charged particle track reconstruction is a foundational task in collider experiments and the main computational bottleneck in particle reconstruction. Graph neural networks (GNNs) have shown strong performance for this problem, but costly graph construction, irregular computations, and random memory access patterns substantially limit their throughput. The recently proposed Hashing-based Efficient Point Transformer (HEPT) offers a theoretically guaranteed near-linear complexity for large point cloud processing via locality-sensitive hashing (LSH) in attention computations; however, its evaluations have largely focused on embedding quality, and the object condensation pipeline on which HEPT relies requires a post-hoc clustering step (e.g., DBScan) that can dominate runtime. In this work, we make two contributions. First, we present a unified, fair evaluation of physics tracking performance for HEPT and a representative GNN-based pipeline under the same dataset and metrics. Second, we introduce HEPTv2 by extending HEPT with a lightweight decoder that eliminates the clustering stage and directly predicts track assignments. This modification preserves HEPT's regular, hardware-friendly computations while enabling ultra-fast end-to-end inference. On the TrackML dataset, optimized HEPTv2 achieves approximately 28 ms per event on an A100 while maintaining competitive tracking efficiency. These results position HEPTv2 as a practical, scalable alternative to GNN-based pipelines for fast tracking.