Goto

Collaborating Authors

 subsystem


Accelerating Model-Free Optimization via Averaging of Cost Samples

Neural Information Processing Systems

Model-free optimization methods typically rely on cost samples gathered by perturbing the current solution estimate along a finite and fixed set of directions. However, at each iteration, only the current cost samples are used, while potentially informative, previously collected samples are discarded. In this work, we challenge this conventional approach by introducing a simple yet effective memory mechanism that maintains an auxiliary vector of iteratively updated cost samples. By leveraging this stored information, our method estimates descent directions through an averaging of all perturbing directions weighted by the auxiliary vector components.


Characterizing control between interacting subsystems with deep Jacobian estimation

Neural Information Processing Systems

Biological function arises through the dynamical interactions of multiple subsystems, including those between brain areas, within gene regulatory networks, and more. A common approach to understanding these systems is to model the dynamics of each subsystem and characterize communication between them. An alternative approach is through the lens of control theory: how the subsystems control one another. This approach involves inferring the directionality, strength, and contextual modulation of control between subsystems. However, methods for understanding subsystem control are typically linear and cannot adequately describe the rich contextual effects enabled by nonlinear complex systems.



Learning Modular Simulations for Homogeneous Systems

Neural Information Processing Systems

Complex systems are often decomposed into modular subsystems for engineering tractability. Although various equation based white-box modeling techniques make use of such structure, learning based methods have yet to incorporate these ideas broadly. We present a modular simulation framework for modeling homogeneous multibody dynamical systems, which combines ideas from graph neural networks and neural differential equations. We learn to model the individual dynamical subsystem as a neural ODE module. Full simulation of the composite system is orchestrated via spatio-temporal message passing between these modules. An arbitrary number of modules can be combined to simulate systems of a wide variety of coupling topologies. We evaluate our framework on a variety of systems and show that message passing allows coordination between multiple modules over time for accurate predictions and in certain cases, enables zero-shot generalization to new system configurations. Furthermore, we show that our models can be transferred to new system configurations with lower data requirement and training effort, compared to those trained from scratch.



AircraftVerse: A Large-Scale Multimodal Dataset of Aerial Vehicle Designs

Neural Information Processing Systems

Aircraft design encompasses different physics domains and, hence, multiple modalities of representation. The evaluation of these designs requires the use of scientific analytical and simulation models ranging from computer-aided design tools for structural and manufacturing analysis, computational fluid dynamics tools for drag and lift computation, battery models for energy estimation, and simulation models for flight control and dynamics. AircraftVerse contains $27{,}714$ diverse air vehicle designs - the largest corpus of designs with this level of complexity. Each design comprises the following artifacts: a symbolic design tree describing topology, propulsion subsystem, battery subsystem, and other design details; a STandard for the Exchange of Product (STEP) model data; a 3D CAD design using a stereolithography (STL) file format; a 3D point cloud for the shape of the design; and evaluation results from high fidelity state-of-the-art physics models that characterize performance metrics such as maximum flight distance and hover-time. We also present baseline surrogate models that use different modalities of design representation to predict design performance metrics, which we provide as part of our dataset release. Finally, we discuss the potential impact of this dataset on the use of learning in aircraft design, and more generally, in the emerging field of deep learning for scientific design. AircraftVerse is accompanied by a datasheet as suggested in the recent literature, and it is released under Creative Commons Attribution-ShareAlike (CC BY-SA) license. The dataset with baseline models are hosted at http://doi.org/10.5281/zenodo.6525446,


Learning Modular Simulations for Homogeneous Systems

Neural Information Processing Systems

Complex systems are often decomposed into modular subsystems for engineering tractability. Although various equation based white-box modeling techniques make use of such structure, learning based methods have yet to incorporate these ideas broadly. We present a modular simulation framework for modeling homogeneous multibody dynamical systems, which combines ideas from graph neural networks and neural differential equations. We learn to model the individual dynamical subsystem as a neural ODE module. Full simulation of the composite system is orchestrated via spatio-temporal message passing between these modules. An arbitrary number of modules can be combined to simulate systems of a wide variety of coupling topologies. We evaluate our framework on a variety of systems and show that message passing allows coordination between multiple modules over time for accurate predictions and in certain cases, enables zero-shot generalization to new system configurations. Furthermore, we show that our models can be transferred to new system configurations with lower data requirement and training effort, compared to those trained from scratch.


Cascaded Tightly-Coupled Observer Design for Single-Range-Aided Inertial Navigation

arXiv.org Artificial Intelligence

This work introduces a single-range-aided navigation observer that reconstructs the full state of a rigid body using only an Inertial Measurement Unit (IMU), a body-frame vector measurement (e.g., magnetometer), and a distance measurement from a fixed anchor point. The design first formulates an extended linear time-varying (LTV) system to estimate body-frame position, body-frame velocity, and the gravity direction. The recovered gravity direction, combined with the body-frame vector measurement, is then used to reconstruct the full orientation on $\mathrm{SO}(3)$, resulting in a cascaded observer architecture. Almost Global Asymptotic Stability (AGAS) of the cascaded design is established under a uniform observability condition, ensuring robustness to sensor noise and trajectory variations. Simulation studies on three-dimensional trajectories demonstrate accurate estimation of position, velocity, and orientation, highlighting single-range aiding as a lightweight and effective modality for autonomous navigation.


System Identification and Adaptive Input Estimation on the Jaiabot Micro Autonomous Underwater Vehicle

arXiv.org Artificial Intelligence

This paper reports an attempt to model the system dynamics and estimate both the unknown internal control input and the state of a recently developed marine autonomous vehicle, the Jaiabot. Although the Jaiabot has shown promise in many applications, process and sensor noise necessitates state estimation and noise filtering. In this work, we present the first surge and heading linear dynamical model for Jaiabots derived from real data collected during field testing. An adaptive input estimation algorithm is implemented to accurately estimate the control input and hence the state. For validation, this approach is compared to the classical Kalman filter, highlighting its advantages in handling unknown control inputs.


FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination

arXiv.org Artificial Intelligence

F ALCON actively decouples locomotion and manipulation through two modular diffusion policies, coordinated by a vision-language foundation model. The VLM encodes global scene context, proprioceptive states, and goal instructions into a shared latent embedding that conditions both subsystems. Abstract--We present FoundAtion-model-guided decoupled LoCO-maNipulation visuomotor policies (F ALCON), a framework for loco-manipulation that combines modular diffusion policies with a vision-language foundation model as the coordinator . Our approach explicitly decouples locomotion and manipulation into two specialized visuomotor policies, allowing each subsystem to rely on its own observations. This mitigates the performance degradation that arise when a single policy is forced to fuse heterogeneous, potentially mismatched observations from locomotion and manipulation. Our key innovation lies in restoring coordination between these two independent policies through a vision-language foundation model, which encodes global observations and language instructions into a shared latent embedding conditioning both diffusion policies. On top of this backbone, we introduce a phase-progress head that uses textual descriptions of task stages to infer discrete phase and continuous progress estimates without manual phase labels. T o further structure the latent space, we incorporate a coordination-aware contrastive loss that explicitly encodes cross-subsystem compatibility between arm and base actions. Results show that it surpasses centralized and decentralized baselines while exhibiting improved robustness and generalization to out-of-distribution scenarios. ECENT progress in robot learning and foundation models has rekindled the longstanding vision of general-purpose robots that can move through unstructured environments and manipulate diverse objects with minimal task-specific engineering. Large Behavior Models (LBMs) extend the diffusion policy paradigm to multi-task dexterous manipulation [1], training a single policy across broad datasets of real and simulated trajectories. Robotics' Memo platform [8], demonstrate impressive whole-body behaviors that combine locomotion, manipulation, and language grounding in increasingly realistic environments. These developments suggest a future where robot generalist models consume raw sensor streams and language instructions and directly output actions to interact with the physical world. However, loco-manipulation, jointly controlling a mobile base and one or more arms, remains especially challenging on legged platforms [9]-[11], where the same body must simultaneously maintain stability and accomplish precise manipulation under different sensor streams and poses. In this work, we focus on a specific yet representative setting in which an arm-mounted quadruped robot performs long-horizon loco-manipulation tasks using only RGB observations, proprioceptive states, and sparse language instructions.