Optimization
Zero-Shot Retargeting of Learned Quadruped Locomotion Policies Using Hybrid Kinodynamic Model Predictive Control
Li, He, Zhang, Tingnan, Yu, Wenhao, Wensing, Patrick M.
Reinforcement Learning (RL) has witnessed great strides for quadruped locomotion, with continued progress in the reliable sim-to-real transfer of policies. However, it remains a challenge to reuse a policy on another robot, which could save time for retraining. In this work, we present a framework for zero-shot policy retargeting wherein diverse motor skills can be transferred between robots of different shapes and sizes. The new framework centers on a planning-and-control pipeline that systematically integrates RL and Model Predictive Control (MPC). The planning stage employs RL to generate a dynamically plausible trajectory as well as the contact schedule, avoiding the combinatorial complexity of contact sequence optimization. This information is then used to seed the MPC to stabilize and robustify the policy roll-out via a new Hybrid Kinodynamic (HKD) model that implicitly optimizes the foothold locations. Hardware results show an ability to transfer policies from both the A1 and Laikago robots to the MIT Mini Cheetah robot without requiring any policy re-tuning.
Versatile Real-Time Motion Synthesis via Kino-Dynamic MPC with Hybrid-Systems DDP
Li, He, Zhang, Tingnan, Yu, Wenhao, Wensing, Patrick M.
Specialized motions such as jumping are often achieved on quadruped robots by solving a trajectory optimization problem once and executing the trajectory using a tracking controller. This approach is in parallel with Model Predictive Control (MPC) strategies that commonly control regular gaits via online re-planning. In this work, we present a nonlinear MPC (NMPC) technique that unlocks on-the-fly re-planning of specialized motion skills and regular locomotion within a unified framework. The NMPC reasons about a hybrid kinodynamic model, and is solved using a variant of a constrained Differential Dynamic Programming (DDP) solver. The proposed NMPC enables the robot to perform a variety of agile skills like jumping, bounding, and trotting, and the rapid transition between these skills. We evaluated the proposed algorithm with three challenging motion sequences that combine multiple agile skills, on two quadruped platforms, Unitree A1, and MIT Mini Cheetah, showing its effectiveness and generality.
First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces
Jordan, Michael I., Lin, Tianyi, Vlatakis-Gkaragkounis, Emmanouil-Vasileios
From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al. [2022] have recently shown that geodesic convex concave Riemannian problems always admit saddle-point solutions. Inspired by this result, we study whether a performance gap between Riemannian and optimal Euclidean space convex-concave algorithms is necessary. We answer this question in the negative-we prove that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically strongly-convex-concave case, matching the Euclidean result. Our results also extend to the stochastic or non-smooth case where RCEG and Riemanian gradient ascent descent (RGDA) achieve near-optimal convergence rates up to factors depending on curvature of the manifold.
Supervised Contrastive Learning as Multi-Objective Optimization for Fine-Tuning Large Pre-trained Language Models
Moukafih, Youness, Ghogho, Mounir, Smaili, Kamel
Recently, Supervised Contrastive Learning (SCL) has been shown to achieve excellent performance in most classification tasks. In SCL, a neural network is trained to optimize two objectives: pull an anchor and positive samples together in the embedding space, and push the anchor apart from the negatives. However, these two different objectives may conflict, requiring trade-offs between them during optimization. In this work, we formulate the SCL problem as a Multi-Objective Optimization problem for the fine-tuning phase of RoBERTa language model. Two methods are utilized to solve the optimization problem: (i) the linear scalarization (LS) method, which minimizes a weighted linear combination of pertask losses; and (ii) the Exact Pareto Optimal (EPO) method which finds the intersection of the Pareto front with a given preference vector. We evaluate our approach on several GLUE benchmark tasks, without using data augmentations, memory banks, or generating adversarial examples. The empirical results show that the proposed learning strategy significantly outperforms a strong competitive contrastive learning baseline
Constrained Dynamic Movement Primitives for Safe Learning of Motor Skills
Shaw, Seiji, Jha, Devesh K., Raghunathan, Arvind, Corcodel, Radu, Romeres, Diego, Konidaris, George, Nikovski, Daniel
Dynamic movement primitives are widely used for learning skills which can be demonstrated to a robot by a skilled human or controller. While their generalization capabilities and simple formulation make them very appealing to use, they possess no strong guarantees to satisfy operational safety constraints for a task. In this paper, we present constrained dynamic movement primitives (CDMP) which can allow for constraint satisfaction in the robot workspace. We present a formulation of a non-linear optimization to perturb the DMP forcing weights regressed by locally-weighted regression to admit a Zeroing Barrier Function (ZBF), which certifies workspace constraint satisfaction. We demonstrate the proposed CDMP under different constraints on the end-effector movement such as obstacle avoidance and workspace constraints on a physical robot. A video showing the implementation of the proposed algorithm using different manipulators in different environments could be found here https://youtu.be/hJegJJkJfys.
Lazy Probabilistic Roadmaps Revisited
Ramirez, Miquel, Selvaratnam, Daniel, Manzie, Chris
This paper describes a revision of the classic Lazy Probabilistic Roadmaps algorithm (Lazy PRM), that results from pairing PRM and a novel Branch-and-Cut (BC) algorithm. Cuts are dynamically generated constraints that are imposed on minimum cost paths over the geometric graphs selected by PRM. Cuts eliminate paths that cannot be mapped into smooth plans that satisfy suitably defined kinematic constraints. We generate candidate smooth plans by fitting splines to vertices in minimum-cost path. Plans are validated with a recently proposed algorithm that maps them into finite traces, without need to choose a fixed discretization step. Trace elements exactly describe when plans cross constraint boundaries modulo arithmetic precision. We evaluate several planners using our methods over the recently proposed BARN benchmark, and we report evidence of the scalability of our approach.
TTOpt: A Maximum Volume Quantized Tensor Train-based Optimization and its Application to Reinforcement Learning
Sozykin, Konstantin, Chertkov, Andrei, Schutski, Roman, Phan, Anh-Huy, Cichocki, Andrzej, Oseledets, Ivan
We present a novel procedure for optimization based on the combination of efficient quantized tensor train representation and a generalized maximum matrix volume principle. We demonstrate the applicability of the new Tensor Train Optimizer (TTOpt) method for various tasks, ranging from minimization of multidimensional functions to reinforcement learning. Our algorithm compares favorably to popular evolutionary-based methods and outperforms them by the number of function evaluations or execution time, often by a significant margin.
Convergence of the mini-batch SIHT algorithm
The Iterative Hard Thresholding (IHT) algorithm has been considered extensively as an effective deterministic algorithm for solving sparse optimizations. The IHT algorithm benefits from the information of the batch (full) gradient at each point and this information is a crucial key for the convergence analysis of the generated sequence. However, this strength becomes a weakness when it comes to machine learning and high dimensional statistical applications because calculating the batch gradient at each iteration is computationally expensive or impractical. Fortunately, in these applications the objective function has a summation structure that can be taken advantage of to approximate the batch gradient by the stochastic mini-batch gradient. In this paper, we study the mini-batch Stochastic IHT (SIHT) algorithm for solving the sparse optimizations. As opposed to previous works where increasing and variable mini-batch size is necessary for derivation, we fix the mini-batch size according to a lower bound that we derive and show our work. To prove stochastic convergence of the objective value function we first establish a critical sparse stochastic gradient descent property. Using this stochastic gradient descent property we show that the sequence generated by the stochastic mini-batch SIHT is a supermartingale sequence and converges with probability one. Unlike previous work we do not assume the function to be a restricted strongly convex. To the best of our knowledge, in the regime of sparse optimization, this is the first time in the literature that it is shown that the sequence of the stochastic function values converges with probability one by fixing the mini-batch size for all steps.
Optimization-Based Mechanical Perception for Peduncle Localization During Robotic Fruit Harvest
Cravetz, Miranda, Grimm, Cindy, Davidson, Joseph R.
Rising global food demand and harsh working conditions make fruit harvest an important domain to automate. Peduncle localization is an important step for any automated fruit harvesting system, since fruit separation techniques are highly sensitive to peduncle location. Most work on peduncle localization has focused on computer vision, but peduncles can be difficult to visually access due to the cluttered nature of agricultural environments. Our work proposes an alternative method which relies on mechanical -- rather than visual -- perception to localize the peduncle. To estimate the location of this important plant feature, we fit wrench measurements from a wrist force/torque sensor to a physical model of the fruit-plant system, treating the fruit's attachment point as a parameter to be tuned. This method is performed inline as part of the fruit picking procedure. Using our orchard proxy for evaluation, we demonstrate that the technique is able to localize the peduncle within a median distance of 3.8 cm and median orientation error of 16.8 degrees.
Dueling Convex Optimization with General Preferences
Saha, Aadirupa, Koren, Tomer, Mansour, Yishay
We address the problem of \emph{convex optimization with dueling feedback}, where the goal is to minimize a convex function given a weaker form of \emph{dueling} feedback. Each query consists of two points and the dueling feedback returns a (noisy) single-bit binary comparison of the function values of the two queried points. The translation of the function values to the single comparison bit is through a \emph{transfer function}. This problem has been addressed previously for some restricted classes of transfer functions, but here we consider a very general transfer function class which includes all functions that can be approximated by a finite polynomial with a minimal degree $p$. Our main contribution is an efficient algorithm with convergence rate of $\smash{\widetilde O}(\epsilon^{-4p})$ for a smooth convex objective function, and an optimal rate of $\smash{\widetilde O}(\epsilon^{-2p})$ when the objective is smooth and strongly convex.