Goto

Collaborating Authors

 Energy


Better Neural Network Expressivity: Subdividing the Simplex

arXiv.org Artificial Intelligence

This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that $\lceil \log_2(n+1) \rceil$ hidden layers are sufficient to compute all continuous piecewise linear (CPWL) functions on $\mathbb{R}^n$. Hertrich, Basu, Di Summa, and Skutella (NeurIPS'21 / SIDMA'23) conjectured that this result is optimal in the sense that there are CPWL functions on $\mathbb{R}^n$, like the maximum function, that require this depth. We disprove the conjecture and show that $\lceil\log_3(n-1)\rceil+1$ hidden layers are sufficient to compute all CPWL functions on $\mathbb{R}^n$. A key step in the proof is that ReLU neural networks with two hidden layers can exactly represent the maximum function of five inputs. More generally, we show that $\lceil\log_3(n-2)\rceil+1$ hidden layers are sufficient to compute the maximum of $n\geq 4$ numbers. Our constructions almost match the $\lceil\log_3(n)\rceil$ lower bound of Averkov, Hojny, and Merkert (ICLR'25) in the special case of ReLU networks with weights that are decimal fractions. The constructions have a geometric interpretation via polyhedral subdivisions of the simplex into ``easier'' polytopes.


Search-TTA: A Multimodal Test-Time Adaptation Framework for Visual Search in the Wild

arXiv.org Artificial Intelligence

To perform outdoor visual navigation and search, a robot may leverage satellite imagery to generate visual priors. This can help inform high-level search strategies, even when such images lack sufficient resolution for target recognition. However, many existing informative path planning or search-based approaches either assume no prior information, or use priors without accounting for how they were obtained. Recent work instead utilizes large Vision Language Models (VLMs) for generalizable priors, but their outputs can be inaccurate due to hallucination, leading to inefficient search. To address these challenges, we introduce Search-TTA, a multimodal test-time adaptation framework with a flexible plug-and-play interface compatible with various input modalities (e.g., image, text, sound) and planning methods (e.g., RL-based). First, we pretrain a satellite image encoder to align with CLIP's visual encoder to output probability distributions of target presence used for visual search. Second, our TTA framework dynamically refines CLIP's predictions during search using uncertainty-weighted gradient updates inspired by Spatial Poisson Point Processes. To train and evaluate Search-TTA, we curate AVS-Bench, a visual search dataset based on internet-scale ecological data containing 380k images and taxonomy data. We find that Search-TTA improves planner performance by up to 30.0%, particularly in cases with poor initial CLIP predictions due to domain mismatch and limited training data. It also performs comparably with significantly larger VLMs, and achieves zero-shot generalization via emergent alignment to unseen modalities. Finally, we deploy Search-TTA on a real UAV via hardware-in-the-loop testing, by simulating its operation within a large-scale simulation that provides onboard sensing.


Large language models as uncertainty-calibrated optimizers for experimental discovery

arXiv.org Artificial Intelligence

Scientific discovery increasingly depends on efficient experimental optimization to navigate vast design spaces under time and resource constraints. Traditional approaches often require extensive domain expertise and feature engineering. While large language models, with their vast scientific knowledge, circumvent the feature engineering limitations, they lack the calibrated uncertainty estimates required for high-stakes decision making. Hence, current optimization methods force a choice between domain knowledge and reliability, with no principled approach that affords both. In this work, we show that training language models through the uncertainty-aware objectives of traditional optimization methods enables their use as reliable optimizers guided by natural language. By teaching LLMs from experimental outcomes under uncertainty, we transform their overconfidence from a fundamental limitation into a precise calibration mechanism. Applied to Buchwald-Hartwig reactions, a cornerstone of pharmaceutical synthesis, our method nearly doubles the discovery rate of high-yielding reaction conditions, from 24% to 43% in 50 experimental iterations starting from 10 unsuccessful conditions. Across 19 diverse optimization problems spanning organic synthesis, materials science and catalysis, process chemistry, and molecular design, our approach ranks first on average, establishing a new paradigm for reliable, uncertainty-guided optimization with LLMs. Our approach can accelerate discovery by lowering the barrier to using powerful optimization methods, replacing the need for domain-specific feature engineering with more accessible natural language interfaces. These findings highlight that ensuring reliability through principled uncertainty quantification is critical for realizing the full potential of AI-guided experimentation.


Parameter-Efficient Conditioning for Material Generalization in Graph-Based Simulators

arXiv.org Artificial Intelligence

Graph network-based simulators (GNS) have demonstrated strong potential for learning particle-based physics (such as fluids, deformable solids, and granular flows) while generalizing to unseen geometries due to their inherent inductive biases. However, existing models are typically trained for a single material type and fail to generalize across distinct constitutive behaviors, limiting their applicability in real-world engineering settings. Using granular flows as a running example, we propose a parameter-efficient conditioning mechanism that makes the GNS model adaptive to material parameters. We identify that sensitivity to material properties is concentrated in the early message-passing (MP) layers, a finding we link to the local nature of constitutive models (e.g., Mohr-Coulomb) and their effects on information propagation. We empirically validate this by showing that fine-tuning only the first few (1-5) of 10 MP layers of a pretrained model achieves comparable test performance as compared to fine-tuning the entire network. Building on this insight, we propose a parameter-efficient Feature-wise Linear Modulation (FiLM) conditioning mechanism designed to specifically target these early layers. This approach produces accurate long-term rollouts on unseen, interpolated, or moderately extrapolated values (e.g., up to 2.5 degrees for friction angle and 0.25 kPa for cohesion) when trained exclusively on as few as 12 short simulation trajectories from new materials, representing a 5-fold data reduction compared to a baseline multi-task learning method. Finally, we validate the model's utility by applying it to an inverse problem, successfully identifying unknown cohesion parameters from trajectory data. This approach enables the use of GNS in inverse design and closed-loop control tasks where material properties are treated as design variables.


Bioinspired Soft Quadrotors Jointly Unlock Agility, Squeezability, and Collision Resilience

arXiv.org Artificial Intelligence

Natural flyers use soft wings to seamlessly enable a wide range of flight behaviours, including agile manoeuvres, squeezing through narrow passageways, and withstanding collisions. In contrast, conventional quadrotor designs rely on rigid frames that support agile flight but inherently limit collision resilience and squeezability, thereby constraining flight capabilities in cluttered environments. Inspired by the anisotropic stiffness and distributed mass-energy structures observed in biological organisms, we introduce FlexiQuad, a soft-frame quadrotor design approach that limits this trade-off. We demonstrate a 405-gram FlexiQuad prototype, three orders of magnitude more compliant than conventional quadrotors, yet capable of acrobatic manoeuvres with peak speeds above 80 km/h and linear and angular accelerations exceeding 3 g and 300 rad/s$^2$, respectively. Analysis demonstrates it can replicate accelerations of rigid counterparts up to a thrust-to-weight ratio of 8. Simultaneously, FlexiQuad exhibits fourfold higher collision resilience, surviving frontal impacts at 5 m/s without damage and reducing destabilising forces in glancing collisions by a factor of 39. Its frame can fully compress, enabling flight through gaps as narrow as 70% of its nominal width. Our analysis identifies an optimal structural softness range, from 0.006 to 0.77 N/mm, comparable to that of natural flyers' wings, whereby agility, squeezability, and collision resilience are jointly achieved for FlexiQuad models from 20 to 3000 grams. FlexiQuad expands hovering drone capabilities in complex environments, enabling robust physical interactions without compromising flight performance.


ProDER: A Continual Learning Approach for Fault Prediction in Evolving Smart Grids

arXiv.org Artificial Intelligence

As smart grids evolve to meet growing energy demands and modern operational challenges, the ability to accurately predict faults becomes increasingly critical. However, existing AI-based fault prediction models struggle to ensure reliability in evolving environments where they are required to adapt to new fault types and operational zones. In this paper, we propose a continual learning (CL) framework in the smart grid context to evolve the model together with the environment. We design four realistic evaluation scenarios grounded in class-incremental and domain-incremental learning to emulate evolving grid conditions. We further introduce Prototype-based Dark Experience Replay (ProDER), a unified replay-based approach that integrates prototype-based feature regularization, logit distillation, and a prototype-guided replay memory. ProDER achieves the best performance among tested CL techniques, with only a 0.045 accuracy drop for fault type prediction and 0.015 for fault zone prediction. These results demonstrate the practicality of CL for scalable, real-world fault prediction in smart grids.


Stable and Robust SLIP Model Control via Energy Conservation-Based Feedback Cancellation for Quadrupedal Applications

arXiv.org Artificial Intelligence

In this paper, we present an energy-conservation based control architecture for stable dynamic motion in quadruped robots. We model the robot as a Spring-loaded Inverted Pendulum (SLIP), a model well-suited to represent the bouncing motion characteristic of running gaits observed in various biological quadrupeds and bio-inspired robotic systems. The model permits leg-orientation control during flight and leg-length control during stance, a design choice inspired by natural quadruped behaviors and prevalent in robotic quadruped systems. Our control algorithm uses the reduced-order SLIP dynamics of the quadruped to track a stable parabolic spline during stance, which is calculated using the principle of energy conservation. Through simulations based on the design specifications of an actual quadruped robot, Ghost Robotics Minitaur, we demonstrate that our control algorithm generates stable bouncing gaits. Additionally, we illustrate the robustness of our controller by showcasing its ability to maintain stable bouncing even when faced with up to a 10% error in sensor measurements.


An End-to-End Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drones

arXiv.org Artificial Intelligence

The emergence of truck-drone collaborative systems in last-mile logistics has positioned the Traveling Salesman Problem with Drones (TSP-D) as a pivotal extension of classical routing optimization, where synchronized vehicle coordination promises substantial operational efficiency and reduced environmental impact, yet introduces NP-hard combinatorial complexity beyond the reach of conventional optimization paradigms. Deep reinforcement learning offers a theoretically grounded framework to address TSP-D's inherent challenges through self-supervised policy learning and adaptive decision-making. This study proposes a hierarchical Actor-Critic deep reinforcement learning framework for solving the TSP-D problem. The architecture consists of two primary components: a Transformer-inspired encoder and an efficient Minimal Gated Unit decoder. The encoder incorporates a novel, optimized k-nearest neighbors sparse attention mechanism specifically for focusing on relevant spatial relationships, further enhanced by the integration of global node features. The Minimal Gated Unit decoder processes these encoded representations to efficiently generate solution sequences. The entire framework operates within an asynchronous advantage actor-critic paradigm. Experimental results show that, on benchmark TSP-D instances of various scales (N=10 to 100), the proposed model can obtain competitive or even superior solutions in shorter average computation times compared to high-performance heuristic algorithms and existing reinforcement learning methods. Moreover, compared to advanced reinforcement learning algorithm benchmarks, the proposed framework significantly reduces the total training time required while achieving superior final performance, highlighting its notable advantage in training efficiency.


Accelerating HDC-CNN Hybrid Models Using Custom Instructions on RISC-V GPUs

arXiv.org Artificial Intelligence

Machine learning based on neural networks has advanced rapidly, but the high energy consumption required for training and inference remains a major challenge. Hyperdimensional Computing (HDC) offers a lightweight, brain-inspired alternative that enables high parallelism but often suffers from lower accuracy on complex visual tasks. To overcome this, hybrid accelerators combining HDC and Convolutional Neural Networks (CNNs) have been proposed, though their adoption is limited by poor generalizability and programmability. The rise of open-source RISC-V architectures has created new opportunities for domain-specific GPU design. Unlike traditional proprietary GPUs, emerging RISC-V-based GPUs provide flexible, programmable platforms suitable for custom computation models such as HDC. In this study, we design and implement custom GPU instructions optimized for HDC operations, enabling efficient processing for hybrid HDC-CNN workloads. Experimental results using four types of custom HDC instructions show a performance improvement of up to 56.2 times in microbenchmark tests, demonstrating the potential of RISC-V GPUs for energy-efficient, high-performance computing.


Encoding Biomechanical Energy Margin into Passivity-based Synchronization for Networked Telerobotic Systems

arXiv.org Artificial Intelligence

Abstract--aintaining system stability and accurate position tracking is imperative in networked robotic systems, particularly for haptics-enabled human-robot interaction. Recent literature have integrated human biomechanics into the stabilizers implemented for teleoperation, enhancing force preservation while guaranteeing convergence and safety. However, position desynchronization due to imperfect communication and non-passive behaviors remains a challenge. We provide the mathematical design synthesis of the stabilizer and the proof of stability. We also conducted a series of grid simulations and systematic experiments and compared the performance with state-of-the-art solutions regarding varying time delays and environmental conditions. The proposed stabilizer is effective for various telerobotic applications requiring precise position synchronization.aintaining Recent literature have integrated human biomechanics into the stabilizers implemented for teleoperation, enhancing force preservation while guaranteeing convergence and safety. However, position desynchronization due to imperfect communication and non-passive behaviors remains a challenge. We provide the mathematical design synthesis of the stabilizer and the proof of stability.