nominal model
Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the cumulative reward while satisfying a constraint, even when there is a mismatch between the real model and an accessible simulator/nominal model. In particular, we consider the robust constrained Markov decision problem (RCMDP) where an agent needs to maximize the reward and satisfy the constraint against the worst possible stochastic model under the uncertainty set centered around an unknown nominal model. Primal-dual methods, effective for standard constrained MDP (CMDP), are not applicable here because of the lack of the strong duality property. Further, one cannot apply the standard robust value-iteration based approach on the composite value function, either, as the worst-case models may be different for the reward value function and the constraint value function. We propose a novel technique that effectively minimizes the constraint value function--to satisfy the constraints; on the other hand, when all the constraints are satisfied, it can simply maximize the robust reward value function. We prove that such an algorithm finds a policy with at most $\epsilon$ sub-optimality and a feasible policy after $O(\epsilon^{-2})$ iterations. In contrast to the state-of-the-art method, we do not need to employ a binary search; thus, we reduce the computation time and achieve a better performance, especially for continuous state-space.
A Data and Code Availability
The implementations of the experiments on ABC and FTDC datasets are similar. For the stability analysis, we are interested in the norm of term 1. In Section E.1, we briefly discuss the motivation behind studying age prediction and PCA-based statistical analysis in this context. In Section E.2, we provide additional details on cortical thickness data acquisition. In Section E.3, we report the results for stability analysis of VNNs and PCA-regression models for FTDC100 ( In Section E.4, we study the stability of VNNs on two simulated In Section E.5, we include additional figures A promising application of brain age prediction is early detection of neurodegenerative diseases (such as Alzheimer's, Huntingson's disease) which may manifest themselves as error in age prediction in pathological contexts by machine learning models trained E.4 Stability of VNNs on Synthetic Data We consider two settings for synthetic data.
Disturbance Compensation for Safe Kinematic Control of Robotic Systems with Closed Architecture
Zhang, Fan, Chen, Jinfeng, Ahanda, Joseph J. B. Mvogo, Richter, Hanz, Lv, Ge, Hu, Bin, Lin, Qin
XX 1 Disturbance Compensation for Safe Kinematic Control of Robotic Systems with Closed Architecture Fan Zhang 1,2, Jinfeng Chen 1, Joseph J. B. Mvogo Ahanda 3, Hanz Richter 4, Ge Lv 5, Bin Hu 1,2, Qin Lin 1,2 Abstract--In commercial robotic systems, it is common to encounter a closed inner-loop (low-level) torque controller that is not user-modifiable. However, the outer-loop controller, which sends kinematic commands such as position or velocity for the inner-loop controller to track, is typically exposed to users. In this work, we focus on the development of an easily integrated add-on at the outer-loop layer by combining disturbance rejection control and robust control barrier function for high-performance tracking and safe control of the whole dynamic system of an industrial manipulator . This is particularly beneficial when 1) the inner-loop controller is imperfect, unmodifiable, and uncertain; and 2) the dynamic model exhibits significant uncertainty. Stability analysis, formal safety guarantee proof, simulations, and hardware experiments with a PUMA robotic manipulator are presented. Our solution demonstrates superior performance in terms of simplicity of implementation, robustness, tracking precision, and safety compared to the state of the art. I. INTRODUCTION Robotic systems often employ hierarchical software design, stacking perception, decision-making, planning, and low-level control. Such modularity is particularly beneficial for troubleshooting and improving the reliability of robotic systems. For example, in the control block, a combination of a kinematic controller (outer-loop controller) and a dynamic controller (inner-loop controller) is commonly seen in various robots. However, because tuning the inner-loop controller requires expert knowledge, this component is typically not exposed to users due to product safety considerations, a practice referred to as closed architecture in the literature [1]-[4]. In other words, users are only allowed to design the kinematic controller, sending position or velocity for the inner-loop controller to track. Additionally, mechanical parts 1 The authors are with the Department of Engineering Technology, University of Houston, USA. Corresponding author: Qin Lin, qlin21@central.uh.edu 2 Fan Zhang is also with the Department of Electrical and Computer Engineering, University of Houston, USA 3 Joseph Jean Baptiste Mvogo Ahanda is with the Department of Biomedical Engineering, The University of Ebolowa, Cameroon 4 Hanz Richter is with the Department of Mechanical Engineering, Cleveland State University, USA 5 Ge Lv is with the Department of Mechanical Engineering, Clemson University, USA. This material is based upon work supported by the National Science Foundation under Grant Nos.
A Data and Code Availability
The implementations of the experiments on ABC and FTDC datasets are similar. For the stability analysis, we are interested in the norm of term 1. In Section E.1, we briefly discuss the motivation behind studying age prediction and PCA-based statistical analysis in this context. In Section E.2, we provide additional details on cortical thickness data acquisition. In Section E.3, we report the results for stability analysis of VNNs and PCA-regression models for FTDC100 ( In Section E.4, we study the stability of VNNs on two simulated In Section E.5, we include additional figures A promising application of brain age prediction is early detection of neurodegenerative diseases (such as Alzheimer's, Huntingson's disease) which may manifest themselves as error in age prediction in pathological contexts by machine learning models trained E.4 Stability of VNNs on Synthetic Data We consider two settings for synthetic data.
Adaptive Meta-Learning for Identification of Rover-Terrain Dynamics
Banerjee, S., Harrison, J., Furlong, P. M., Pavone, M.
Rovers require knowledge of terrain to plan trajectories that maximize safety and efficiency. Terrain type classification relies on input from human operators or machine learning-based image classification algorithms. However, high level terrain classification is typically not sufficient to prevent incidents such as rovers becoming unexpectedly stuck in a sand trap; in these situations, online rover-terrain interaction data can be leveraged to accurately predict future dynamics and prevent further damage to the rover. This paper presents a meta-learning-based approach to adapt probabilistic predictions of rover dynamics by augmenting a nominal model affine in parameters with a Bayesian regression algorithm (P-ALPaCA). A regularization scheme is introduced to encourage orthogonality of nominal and learned features, leading to interpretable probabilistic estimates of terrain parameters in varying terrain conditions.
MUKCa: Accurate and Affordable Cobot Calibration Without External Measurement Devices
Franzese, Giovanni, Spahn, Max, Kober, Jens, Della Santina, Cosimo
To increase the reliability of collaborative robots in performing daily tasks, we require them to be accurate and not only repeatable. However, having a calibrated kinematics model is regrettably a luxury, as available calibration tools are usually more expensive than the robots themselves. With this work, we aim to contribute to the democratization of cobots calibration by providing an inexpensive yet highly effective alternative to existing tools. The proposed minimalist calibration routine relies on a 3D-printable tool as the only physical aid to the calibration process. This two-socket spherical-joint tool kinematically constrains the robot at the end effector while collecting the training set. An optimization routine updates the nominal model to ensure a consistent prediction for each socket and the undistorted mean distance between them. We validated the algorithm on three robotic platforms: Franka, Kuka, and Kinova Cobots. The calibrated models reduce the mean absolute error from the order of 10 mm to 0.2 mm for both Franka and Kuka robots. We provide two additional experimental campaigns with the Franka Robot to render the improvements more tangible. First, we implement Cartesian control with and without the calibrated model and use it to perform a standard peg-in-the-hole task with a tolerance of 0.4 mm between the peg and the hole. Second, we perform a repeated drawing task combining Cartesian control with learning from demonstration. Both tasks consistently failed when the model was not calibrated, while they consistently succeeded after calibration.
GP-enhanced Autonomous Drifting Framework using ADMM-based iLQR
Xie, Yangyang, Hu, Cheng, Baumann, Nicolas, Ghignone, Edoardo, Magno, Michele, Xie, Lei
Autonomous drifting is a complex challenge due to the highly nonlinear dynamics and the need for precise real-time control, especially in uncertain environments. To address these limitations, this paper presents a hierarchical control framework for autonomous vehicles drifting along general paths, primarily focusing on addressing model inaccuracies and mitigating computational challenges in real-time control. The framework integrates Gaussian Process (GP) regression with an Alternating Direction Method of Multipliers (ADMM)-based iterative Linear Quadratic Regulator (iLQR). GP regression effectively compensates for model residuals, improving accuracy in dynamic conditions. ADMM-based iLQR not only combines the rapid trajectory optimization of iLQR but also utilizes ADMM's strength in decomposing the problem into simpler sub-problems. Simulation results demonstrate the effectiveness of the proposed framework, with significant improvements in both drift trajectory tracking and computational efficiency. Our approach resulted in a 38$\%$ reduction in RMSE lateral error and achieved an average computation time that is 75$\%$ lower than that of the Interior Point OPTimizer (IPOPT).
Offline Reinforcement Learning via Inverse Optimization
Dimanidis, Ioannis, Ok, Tolga, Esfahani, Peyman Mohajerin
Inspired by the recent successes of Inverse Optimization (IO) across various application domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces, leveraging the convex loss function called ``sub-optimality loss" from the IO literature. To mitigate the distribution shift commonly observed in ORL problems, we further employ a robust and non-causal Model Predictive Control (MPC) expert steering a nominal model of the dynamics using in-hindsight information stemming from the model mismatch. Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation. In the second part of this study, we show that the IO hypothesis class, trained by the proposed convex loss function, enjoys ample expressiveness and achieves competitive performance comparing with the state-of-the-art (SOTA) methods in the low-data regime of the MuJoCo benchmark while utilizing three orders of magnitude fewer parameters, thereby requiring significantly fewer computational resources. To facilitate the reproducibility of our results, we provide an open-source package implementing the proposed algorithms and the experiments.
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
We study distributionally robust offline reinforcement learning (RL), which seeks to find an optimal robust policy purely from an offline dataset that can perform well in perturbed environments. We propose a generic algorithm framework Doubly Pessimistic Model-based Policy Optimization ( \texttt{P} 2\texttt{MPO}) for robust offline RL, which features a novel combination of a flexible model estimation subroutine and a doubly pessimistic policy optimization step. Here the double pessimism principle is crucial to overcome the distribution shift incurred by i) the mismatch between behavior policy and the family of target policies; and ii) the perturbation of the nominal model. Under certain accuracy assumptions on the model estimation subroutine, we show that \texttt{P} 2\texttt{MPO} is provably sample-efficient with robust partial coverage data, which means that the offline dataset has good coverage of the distributions induced by the optimal robust policy and perturbed models around the nominal model. By tailoring specific model estimation subroutines for concrete examples including tabular Robust Markov Decision Process (RMDP), factored RMDP, and RMDP with kernel and neural function approximations, we show that \texttt{P} 2\texttt{MPO} enjoys a \tilde{\mathcal{O}}(n {-1/2}) convergence rate, where n is the number of trajectories in the offline dataset.