trajectory optimization
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning Weikang Wan
This paper introduces DiffTORI, which utilizes Diff erentiable T rajectory O ptimization as the policy representation to generate actions for deep R einforcement and I mitation learning. Trajectory optimization is a powerful and widely used algorithm in control, parameterized by a cost and a dynamics function.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (5 more...)
- Energy (0.46)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Leisure & Entertainment > Games (0.47)
- Information Technology (0.46)
85ea6fd7a2ca3960d0cf5201933ac998-Supplemental.pdf
Theproof for Lemma A.1 can be found in Theorem 4 in [48]. Similar proof procedure also applies to prove the second inequality in (S.34). Proof by contradiction: suppose that the above second-order condition in (S.42)-(S.43) is false. Thus, in the following, we will ignore the computation process for obtainingξ(θ,γ) and simply view thatξ(θ,γ)is the twice continuously differentiable function of(θ,γ) near (θ,0) and ξθ = ξ(θ=θ,γ=0). Look at (S.62) and note that when = 0andγ = 0, (S.62) is identical to the first two equations in (S.59).
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > Germany (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.35)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.34)
DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning
Trajectory optimization is a powerful and widely used algorithm in control, parameterized by a cost and a dynamics function. The key to our approach is to leverage the recent progress in differentiable trajectory optimization, which enables computing the gradients of the loss with respect to the parameters of trajectory optimization. As a result, the cost and dynamics functions of trajectory optimization can be learned end-to-end. DiffTORI addresses the "objective mismatch" issue of prior model-based RL algorithms, as the dynamics model in DiffTORI is learned to directly maximize task performance by differentiating the policy gradient loss through the trajectory optimization process. We further benchmark DiffTORI for imitation learning on standard robotic manipulation task suites with high-dimensional sensory observations and compare our method to feedforward policy classes as well as Energy-Based Models (EBM) and Diffusion. Across 15 model based RL tasks and 35 imitation learning tasks with high-dimensional image and point cloud inputs, DiffTORI outperforms prior state-of-the-art methods in both domains.
Accelerating Motion Planning via Optimal Transport
Motion planning is still an open problem for many disciplines, e.g., robotics, autonomous driving, due to their need for high computational resources that hinder real-time, efficient decision-making. A class of methods striving to provide smooth solutions is gradient-based trajectory optimization. However, those methods usually suffer from bad local minima, while for many settings, they may be inapplicable due to the absence of easy-to-access gradients of the optimization objectives. In response to these issues, we introduce Motion Planning via Optimal Transport (MPOT)---a \textit{gradient-free} method that optimizes a batch of smooth trajectories over highly nonlinear costs, even for high-dimensional tasks, while imposing smoothness through a Gaussian Process dynamics prior via the planning-as-inference perspective. To facilitate batch trajectory optimization, we introduce an original zero-order and highly-parallelizable update rule----the Sinkhorn Step, which uses the regular polytope family for its search directions. Each regular polytope, centered on trajectory waypoints, serves as a local cost-probing neighborhood, acting as a \textit{trust region} where the Sinkhorn Step ``transports'' local waypoints toward low-cost regions. We theoretically show that Sinkhorn Step guides the optimizing parameters toward local minima regions of non-convex objective functions. We then show the efficiency of MPOT in a range of problems from low-dimensional point-mass navigation to high-dimensional whole-body robot motion planning, evincing its superiority compared to popular motion planners, paving the way for new applications of optimal transport in motion planning.