differentiable simulation
First-order Sobolev Reinforcement Learning
Schramm, Fabian, Perrin-Gilbert, Nicolas, Carpentier, Justin
We propose a refinement of temporal-difference learning that enforces first-order Bellman consistency: the learned value function is trained to match not only the Bellman targets in value but also their derivatives with respect to states and actions. By differentiating the Bellman backup through differentiable dynamics, we obtain analytically consistent gradient targets. Incorporating these into the critic objective using a Sobolev-type loss encourages the critic to align with both the value and local geometry of the target function. This first-order TD matching principle can be seamlessly integrated into existing algorithms, such as Q-learning or actor-critic methods (e.g., DDPG, SAC), potentially leading to faster critic convergence and more stable policy gradients without altering their overall structure.
takes natural advantage of fully differentiable simulation, which is exploding in popularity and relevance
We thank the reviewers for their constructive feedback. NeurIPS, control of soft robots has seldom been addressed. We appreciate the reviewers' compliments that our submission is "an interesting piece of work that can have a good We believe concerns can be addressed within the review cycle with text improvements and additional experiments. CNNs can adequately learn over such inputs. We include a few new results below. The topheavy, unactuated head makes this a challenging control task. After 100 optimization iters., it runs 1.5 body lengths in 4 s . After 100 optimization iterations, it runs two body lengths in 4s . However, such an approach has never been demonstrated. Why a Latent Space Is Necessary ( R1). This approach doesn't scale: we tried feeding If the dynamics of the target trajectory are not explored initially, the observer and resulting optimization suffer. This issue is especially salient during design optimization, where system dynamics change. This is enough to bootstrap our learning. R1 wrote "of course the paper's focus is on multi-task learning for soft robotics.
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
Accelerating Visual-Policy Learning through Parallel Differentiable Simulation
You, Haoxiang, Liu, Yilang, Abraham, Ian
In this work, we propose a computationally efficient algorithm for visual policy learning that leverages differentiable simulation and first-order analytical policy gradients. Our approach decouple the rendering process from the computation graph, enabling seamless integration with existing differentiable simulation ecosystems without the need for specialized differentiable rendering software. This decoupling not only reduces computational and memory overhead but also effectively attenuates the policy gradient norm, leading to more stable and smoother optimization. We evaluate our method on standard visual control benchmarks using modern GPU-accelerated simulation. Experiments show that our approach significantly reduces wall-clock training time and consistently outperforms all baseline methods in terms of final returns. Notably, on complex tasks such as humanoid locomotion, our method achieves a $4\times$ improvement in final return, and successfully learns a humanoid running policy within 4 hours on a single GPU.
- Europe > Austria > Vienna (0.14)
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- Information Technology (0.67)
- Energy (0.46)
takes natural advantage of fully differentiable simulation, which is exploding in popularity and relevance
We thank the reviewers for their constructive feedback. NeurIPS, control of soft robots has seldom been addressed. We appreciate the reviewers' compliments that our submission is "an interesting piece of work that can have a good We believe concerns can be addressed within the review cycle with text improvements and additional experiments. CNNs can adequately learn over such inputs. We include a few new results below. The topheavy, unactuated head makes this a challenging control task. After 100 optimization iters., it runs 1.5 body lengths in 4 s . After 100 optimization iterations, it runs two body lengths in 4s . However, such an approach has never been demonstrated. Why a Latent Space Is Necessary ( R1). This approach doesn't scale: we tried feeding If the dynamics of the target trajectory are not explored initially, the observer and resulting optimization suffer. This issue is especially salient during design optimization, where system dynamics change. This is enough to bootstrap our learning. R1 wrote "of course the paper's focus is on multi-task learning for soft robotics.
DiffAero: A GPU-Accelerated Differentiable Simulation Framework for Efficient Quadrotor Policy Learning
Zhang, Xinhong, Wang, Runqing, Ren, Yunfan, Sun, Jian, Fang, Hao, Chen, Jie, Wang, Gang
Abstract-- This letter introduces DiffAero, a lightweight, GPU-accelerated, and fully differentiable simulation framework designed for efficient quadrotor control policy learning. Dif-fAero supports both environment-level and agent-level parallelism and integrates multiple dynamics models, customizable sensor stacks (IMU, depth camera, and LiDAR), and diverse flight tasks within a unified, GPU-native training interface. By fully parallelizing both physics and rendering on the GPU, DiffAero eliminates CPU-GPU data transfer bottlenecks and delivers orders-of-magnitude improvements in simulation throughput. In contrast to existing simulators, DiffAero not only provides high-performance simulation but also serves as a research platform for exploring differentiable and hybrid learning algorithms. Extensive benchmarks and real-world flight experiments demonstrate that DiffAero and hybrid learning algorithms combined can learn robust flight policies in hours on consumer-grade hardware. Quadrotors--and swarms of quadrotors thereof--are increasingly deployed in complex environments for aerial inspection, environmental monitoring, and high-speed racing, owing to their agile maneuverability and onboard sensing capabilities. End-to-end learning addresses these limitations by training neural flight policies that map raw sensor observations directly to control commands, thereby streamlining the autonomy stack and enabling tighter feedback loops [4].
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (8 more...)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
NeuralFluid: Nueral Fluidic System Design and Control with Differentiable Simulation
We present NeuralFluid, a novel framework to explore neural control and design of complex fluidic systems with dynamic solid boundaries. Our system features a fast differentiable Navier-Stokes solver with solid-fluid interface handling, a low-dimensional differentiable parametric geometry representation, a control-shape co-design algorithm, and gym-like simulation environments to facilitate various fluidic control design applications. Additionally, we present a benchmark of design, control, and learning tasks on high-fidelity, high-resolution dynamic fluid environments that pose challenges for existing differentiable fluid simulators. These tasks include designing the control of artificial hearts, identifying robotic end-effector shapes, and controlling a fluid gate. By seamlessly incorporating our differentiable fluid simulator into a learning framework, we demonstrate successful design, control, and learning results that surpass gradient-free solutions in these benchmark tasks.
Differentiable Simulation of Soft Robots with Frictional Contacts
Ménager, Etienne, Montaut, Louis, Lidec, Quentin Le, Carpentier, Justin
In recent years, soft robotics simulators have evolved to offer various functionalities, including the simulation of different material types (e.g., elastic, hyper-elastic) and actuation methods (e.g., pneumatic, cable-driven, servomotor). These simulators also provide tools for various tasks, such as calibration, design, and control. However, efficiently and accurately computing derivatives within these simulators remains a challenge, particularly in the presence of physical contact interactions. Incorporating these derivatives can, for instance, significantly improve the convergence speed of control methods like reinforcement learning and trajectory optimization, enable gradient-based techniques for design, or facilitate end-to-end machine-learning approaches for model reduction. This paper addresses these challenges by introducing a unified method for computing the derivatives of mechanical equations within the finite element method framework, including contact interactions modeled as a nonlinear complementarity problem. The proposed approach handles both collision and friction phases, accounts for their nonsmooth dynamics, and leverages the sparsity introduced by mesh-based models. Its effectiveness is demonstrated through several examples of controlling and calibrating soft systems.
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Xing, Eliot, Luk, Vernon, Oh, Jean
Recent advances in GPU-based parallel simulation have enabled practitioners to collect large amounts of data and train complex control policies using deep reinforcement learning (RL), on commodity GPUs. However, such successes for RL in robotics have been limited to tasks sufficiently simulated by fast rigid-body dynamics. Simulation techniques for soft bodies are comparatively several orders of magnitude slower, thereby limiting the use of RL due to sample complexity requirements. To address this challenge, this paper presents both a novel RL algorithm and a simulation platform to enable scaling RL on tasks involving rigid bodies and deformables. We introduce Soft Analytic Policy Optimization (SAPO), a maximum entropy first-order model-based actor-critic RL algorithm, which uses first-order analytic gradients from differentiable simulation to train a stochastic actor to maximize expected return and entropy. Alongside our approach, we develop Rewarped, a parallel differentiable multiphysics simulation platform that supports simulating various materials beyond rigid bodies. We re-implement challenging manipulation and locomotion tasks in Rewarped, and show that SAPO outperforms baselines over a range of tasks that involve interaction between rigid bodies, articulations, and deformables.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Middle East > Jordan (0.04)