Goto

Collaborating Authors

 Energy


BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands

arXiv.org Artificial Intelligence

Open-vocabulary mobile manipulation (OVMM) requires robots to follow language instructions, navigate, and manipulate while updating their world representation under dynamic environmental changes. However, most prior approaches update their world representation only at discrete update points such as navigation targets, waypoints, or the end of an action step, leaving robots blind between updates and causing cascading failures: overlooked objects, late error detection, and delayed replanning. To address this limitation, we propose BINDER (Bridging INstant and DEliberative Reasoning), a dual process framework that decouples strategic planning from continuous environment monitoring. Specifically, BINDER integrates a Deliberative Response Module (DRM, a multimodal LLM for task planning) with an Instant Response Module (IRM, a VideoLLM for continuous monitoring). The two modules play complementary roles: the DRM performs strategic planning with structured 3D scene updates and guides what the IRM attends to, while the IRM analyzes video streams to update memory, correct ongoing actions, and trigger replanning when necessary. Through this bidirectional coordination, the modules address the trade off between maintaining awareness and avoiding costly updates, enabling robust adaptation under dynamic conditions. Evaluated in three real world environments with dynamic object placement, BINDER achieves substantially higher success and efficiency than SoTA baselines, demonstrating its effectiveness for real world deployment.


AutoTailor: Automatic and Efficient Adaptive Model Deployment for Diverse Edge Devices

arXiv.org Artificial Intelligence

On-device machine learning (ML) has become a fundamental component of emerging mobile applications. Adaptive model deployment delivers efficient inference for heterogeneous device capabilities and performance requirements through customizing neural architectures. SuperNet-based approaches offer a promising solution by generating a large number of model variants from a pre-trained ML model. However, applying SuperNet in existing frameworks suffers from tedious model-aware development and time-consuming hardware-aware profiling, which limits their practical adoption. We present AutoTailor, the first framework to enable automated, end-to-end SuperNet-based adaptive model deployment for edge devices. Unlike manual SuperNet construction, AutoTailor employs a computation graph-guided compilation approach to automatically transform user-provided ML models into SuperNets. To support efficient specialization, AutoTailor incorporates learning-free latency and accuracy predictors, enabling low-cost yet accurate performance prediction. Our extended evaluations demonstrate that AutoTailor reduces the lines of code for SuperNet construction by 11--27$\times$, decreases hardware-aware profiling costs by at least 11$\times$, and achieves up to 15.60\% absolute accuracy improvement and 60.03\% latency reduction compared to state-of-the-art approaches across diverse models and devices.


Toward Data-Driven Surrogates of the Solar Wind with Spherical Fourier Neural Operator

arXiv.org Artificial Intelligence

The solar wind, a continuous stream of charged particles from the Sun's corona, shapes the heliosphere and impacts space systems near Earth. Variations such as high-speed streams and coronal mass ejections can disrupt satellites, power grids, and communications, making accurate modeling essential for space weather forecasting. While 3D magnetohydrodynamic (MHD) models are used to simulate and investigate these variations in the solar wind, they tend to be computationally expensive, limiting their usefulness in investigating the impacts of boundary condition uncertainty. In this work, we develop a surrogate for steady state solar wind modeling, using a Spherical Fourier Neural Operator (SFNO). We compare our model to a previously developed numerical surrogate for this task called HUX, and we show that the SFNO achieves comparable or better performance across several metrics. Though HUX retains advantages in physical smoothness, this underscores the need for improved evaluation criteria rather than a flaw in SFNO. As a flexible and trainable approach, SFNO enables efficient real-time forecasting and can improve with more data. The source code and more visual results are available at https://github.com/rezmansouri/solarwind-sfno-velocity.


An energy-efficient spiking neural network with continuous learning for self-adaptive brain-machine interface

arXiv.org Artificial Intelligence

The number of simultaneously recorded neurons follows an exponentially increasing trend in implantable brain-machine interfaces (iBMIs). Integrating the neural decoder in the implant is an effective data compression method for future wireless iBMIs. However, the non-stationarity of the system makes the performance of the decoder unreliable. To avoid frequent retraining of the decoder and to ensure the safety and comfort of the iBMI user, continuous learning is essential for real-life applications. Since Deep Spiking Neural Networks (DSNNs) are being recognized as a promising approach for developing resource-efficient neural decoder, we propose continuous learning approaches with Reinforcement Learning (RL) algorithms adapted for DSNNs. Banditron and AGREL are chosen as the two candidate RL algorithms since they can be trained with limited computational resources, effectively addressing the non-stationary problem and fitting the energy constraints of implantable devices. To assess the effectiveness of the proposed methods, we conducted both open-loop and closed-loop experiments. The accuracy of open-loop experiments conducted with DSNN Banditron and DSNN AGREL remains stable over extended periods. Meanwhile, the time-to-target in the closed-loop experiment with perturbations, DSNN Banditron performed comparably to that of DSNN AGREL while achieving reductions of 98% in memory access usage and 99% in the requirements for multiply- and-accumulate (MAC) operations during training. Compared to previous continuous learning SNN decoders, DSNN Banditron requires 98% less computes making it a prime candidate for future wireless iBMI systems.


Energy Efficient Sleep Mode Optimization in 5G mmWave Networks via Multi Agent Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Dynamic sleep mode optimization (SMO) in millimeter-wave (mmWave) networks is essential for maximizing energy efficiency (EE) under stringent quality-of-service (QoS) constraints. However, existing optimization and reinforcement learning (RL)-based approaches rely on aggregated, static base station (BS) traffic models that fail to capture non-stationary traffic dynamics and suffer from prohibitively large state-action spaces, limiting their real-world deployment. To address these challenges, this paper proposes a Multi-Agent Deep Reinforcement Learning (MARL) framework employing a Double Deep Q-Network (DDQN), referred to as MARL-DDQN, for adaptive SMO in a 3D urban environment using a time-varying and community-based user equipment (UE) mobility model. Unlike conventional single-agent RL, the proposed MARL-DDQN enables scalable, distributed decision-making with minimal signaling overhead. A realistic BS power consumption model and beamforming are integrated to accurately quantify EE, while QoS is uniquely defined in terms of throughput. The proposed method adaptively learns SMO policies to maximize EE while mitigating inter-cell interference and ensuring throughput fairness. Extensive simulations demonstrate that MARL-DDQN consistently outperforms state-of-the-art SM strategies, including the All On, iterative QoS-aware load-based (IT-QoS-LB), MARL-DDPG, and MARL-PPO, achieving up to 0. 60 Mbit/Joule EE, 8. 5 Mbps 10th-percentile throughput, and satisfying QoS constraints 95 % of the time under dynamic network scenarios. I. Introduction The exponential growth in mobile data demand has necessitated increased spectrum availability and accelerated the expansion of cellular network infrastructure. To address the limitations of the sub-6 GHz spectrum, millimeter wave (mmWave) communications, operating within the 30-300 GHz band, have emerged as a key enabler in fifth-generation (5G) networks. With significantly larger bandwidth availability, mmWave technology presents a viable solution to spectrum scarcity challenges [1]. However, mmWave signals suffer from high propagation loss, atmospheric absorption, and susceptibility to blockages, which severely limit coverage and reliability. To address coverage and growing capacity demands, 5G networks rely on densification, deploying numerous low-power mmWave BSs with inter-site distances of a few hundred meters [1]. These BSs utilize large antenna arrays to enable beamforming and spatial multiplexing, often relying on hybrid analog-digital precoding to reduce hardware complexity [2]. However, the RF chain remains a major source of power consumption, particularly the Analog-to-digital converters (ADCs) and digital-to-analog converters (DACs), whose power scales with sampling rate. Due to the higher frequencies and wider bandwidths of mmWave systems, these components require significantly higher sampling rates than sub-6 GHz systems [3], resulting in substantial energy demands.


SoftNash: Entropy-Regularized Nash Games for Non-Fighting Virtual Fixtures

arXiv.org Artificial Intelligence

Virtual fixtures (VFs) improve precision in teleoperation but often ``fight'' the user, inflating mental workload and eroding the sense of agency. We propose Soft-Nash Virtual Fixtures, a game-theoretic shared-control policy that softens the classic two-player linear-quadratic (LQ) Nash solution by inflating the fixture's effort weight with a single, interpretable scalar parameter $ฯ„$. This yields a continuous dial on controller assertiveness: $ฯ„=0$ recovers a hard, performance-focused Nash / virtual fixture controller, while larger $ฯ„$ reduce gains and pushback, yet preserve the equilibrium structure and continuity of closed-loop stability. We derive Soft-Nash from both a KL-regularized trust-region and a maximum-entropy viewpoint, obtaining a closed-form robot best response that shrinks authority and aligns the fixture with the operator's input as $ฯ„$ grows. We implement Soft-Nash on a 6-DoF haptic device in 3D tracking task ($n=12$). Moderate softness ($ฯ„\approx 1-3$, especially $ฯ„=2$) maintains tracking error statistically indistinguishable from a tuned classic VF while sharply reducing controller-user conflict, lowering NASA-TLX workload, and increasing Sense of Agency (SoAS). A composite BalancedScore that combines normalized accuracy and non-fighting behavior peaks near $ฯ„=2-3$. These results show that a one-parameter Soft-Nash policy can preserve accuracy while improving comfort and perceived agency, providing a practical and interpretable pathway to personalized shared control in haptics and teleoperation.


A Multi-View Multi-Timescale Hypergraph-Empowered Spatiotemporal Framework for EV Charging Forecasting

arXiv.org Artificial Intelligence

Accurate electric vehicle (EV) charging demand forecasting is essential for stable grid operation and proactive EV participation in electricity market. Existing forecasting methods, particularly those based on graph neural networks, are often limited to modeling pairwise relationships between stations, failing to capture the complex, group-wise dynamics inherent in urban charging networks. To address this gap, we develop a novel forecasting framework namely HyperCast, leveraging the expressive power of hypergraphs to model the higher-order spatiotemporal dependencies hidden in EV charging patterns. HyperCast integrates multi-view hypergraphs, which capture both static geographical proximity and dynamic demand-based functional similarities, along with multi-timescale inputs to differentiate between recent trends and weekly periodicities. The framework employs specialized hyper-spatiotemporal blocks and tailored cross-attention mechanisms to effectively fuse information from these diverse sources: views and timescales. Extensive experiments on four public datasets demonstrate that HyperCast significantly outperforms a wide array of state-of-the-art baselines, demonstrating the effectiveness of explicitly modeling collective charging behaviors for more accurate forecasting.


Predicting Public Health Impacts of Electricity Usage

arXiv.org Artificial Intelligence

The electric power sector is a leading source of air pollutant emissions, impacting the public health of nearly every community. Although regulatory measures have reduced air pollutants, fossil fuels remain a significant component of the energy supply, highlighting the need for more advanced demand-side approaches to reduce the public health impacts. To enable health-informed demand-side management, we introduce HealthPredictor, a domain-specific AI model that provides an end-to-end pipeline linking electricity use to public health outcomes. The model comprises three components: a fuel mix predictor that estimates the contribution of different generation sources, an air quality converter that models pollutant emissions and atmospheric dispersion, and a health impact assessor that translates resulting pollutant changes into monetized health damages. Across multiple regions in the United States, our health-driven optimization framework yields substantially lower prediction errors in terms of public health impacts than fuel mix-driven baselines. A case study on electric vehicle charging schedules illustrates the public health gains enabled by our method and the actionable guidance it can offer for health-informed energy management. Overall, this work shows how AI models can be explicitly designed to enable health-informed energy management for advancing public health and broader societal well-being. Our datasets and code are released at: https://github.com/Ren-Research/Health-Impact-Predictor.


Digital Elevation Model Estimation from RGB Satellite Imagery using Generative Deep Learning

arXiv.org Artificial Intelligence

Digital Elevation Models (DEMs) are vital datasets for geospatial applications such as hydrological modeling and environmental monitoring. However, conventional methods to generate DEM, such as using LiDAR and photogrammetry, require specific types of data that are often inaccessible in resource-constrained settings. To alleviate this problem, this study proposes an approach to generate DEM from freely available RGB satellite imagery using generative deep learning, particularly based on a conditional Generative Adversarial Network (GAN). We first developed a global dataset consisting of 12K RGB-DEM pairs using Landsat satellite imagery and NASA's SRTM digital elevation data, both from the year 2000. A unique preprocessing pipeline was implemented to select high-quality, cloud-free regions and aggregate normalized RGB composites from Landsat imagery. Additionally, the model was trained in a two-stage process, where it was first trained on the complete dataset and then fine-tuned on high-quality samples filtered by Structural Similarity Index Measure (SSIM) values to improve performance on challenging terrains. The results demonstrate promising performance in mountainous regions, achieving an overall mean root-mean-square error (RMSE) of 0.4671 and a mean SSIM score of 0.2065 (scale -1 to 1), while highlighting limitations in lowland and residential areas. This study underscores the importance of meticulous preprocessing and iterative refinement in generative modeling for DEM generation, offering a cost-effective and adaptive alternative to conventional methods while emphasizing the challenge of generalization across diverse terrains worldwide.


Physically Interpretable Representation Learning with Gaussian Mixture Variational AutoEncoder (GM-VAE)

arXiv.org Artificial Intelligence

Extracting compact, physically interpretable representations from high-dimensional scientific data is a persistent challenge due to the complex, nonlinear structures inherent in physical systems. We propose a Gaussian Mixture Variational Autoencoder (GM-VAE) framework designed to address this by integrating an Expectation-Maximization (EM)-inspired training scheme with a novel spectral interpretability metric. Unlike conventional VAEs that jointly optimize reconstruction and clustering (often leading to training instability), our method utilizes a block-coordinate descent strategy, alternating between expectation and maximization steps. This approach stabilizes training and naturally aligns latent clusters with distinct physical regimes. To objectively evaluate the learned representations, we introduce a quantitative metric based on graph-Laplacian smoothness, which measures the coherence of physical quantities across the latent manifold. We demonstrate the efficacy of this framework on datasets of increasing complexity: surface reaction ODEs, Navier-Stokes wake flows, and experimental laser-induced combustion Schlieren images. The results show that our GM-VAE yields smooth, physically consistent manifolds and accurate regime clustering, offering a robust data-driven tool for interpreting turbulent and reactive flow systems.