Lutter, Michael, Belousov, Boris, Listmann, Kim, Clever, Debora, Peters, Jan

Learning optimal feedback control laws capable of executing optimal trajectories is essential for many robotic applications. Such policies can be learned using reinforcement learning or planned using optimal control. While reinforcement learning is sample inefficient, optimal control only plans an optimal trajectory from a specific starting configuration. In this paper we propose deep optimal feedback control to learn an optimal feedback policy rather than a single trajectory. By exploiting the inherent structure of the robot dynamics and strictly convex action cost, we can derive principled cost functions such that the optimal policy naturally obeys the action limits, is globally optimal and stable on the training domain given the optimal value function. The corresponding optimal value function is learned end-to-end by embedding a deep differential network in the Hamilton-Jacobi-Bellmann differential equation and minimizing the error of this equality while simultaneously decreasing the discounting from short- to far-sighted to enable the learning. Our proposed approach enables us to learn an optimal feedback control law in continuous time, that in contrast to existing approaches generates an optimal trajectory from any point in state-space without the need of replanning. The resulting approach is evaluated on non-linear systems and achieves optimal feedback control, where standard optimal control methods require frequent replanning.

Dean, Sarah, Mania, Horia, Matni, Nikolai, Recht, Benjamin, Tu, Stephen

We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown linear system is controlled subject to quadratic costs. Leveraging recent developments in the estimation of linear systems and in robust controller synthesis, we present the first provably polynomial time algorithm that provides high probability guarantees of sub-linear regret on this problem. We further study the interplay between regret minimization and parameter estimation by proving a lower bound on the expected regret in terms of the exploration schedule used by any algorithm. Finally, we conduct a numerical study comparing our robust adaptive algorithm to other methods from the adaptive LQR literature, and demonstrate the flexibility of our proposed method by extending it to a demand forecasting problem subject to state constraints.

Dean, Sarah, Mania, Horia, Matni, Nikolai, Recht, Benjamin, Tu, Stephen

Freda, Luigi, Gianni, Mario, Pirri, Fiora

Abstract-- This work presents a methodology to design trajectory tracking feedback control laws, which embed nonparametric statistical models, such as Gaussian Processes (GPs). The aim is to minimize unmodeled dynamics such as undesired slippages. The proposed approach has the benefit of avoiding complex terramechanics analysis to directly estimate from data the robot dynamics on a wide class of trajectories. Experiments in both real and simulated environments prove that the proposed methodology is promising. In the last decades, an increasing interest has been devoted to the design of high performance path tracking. In the literature, three main approaches to face this problem have emerged: (i) model-based and adaptive control [1]-[5]; (ii) Gaussian Processes or stochastic nonlinear models for reinforcement learning of control policies [6], [7], and (iii) nominal models and data-driven estimation of the residual [8], [9].

Dean, Sarah, Mania, Horia, Matni, Nikolai, Recht, Benjamin, Tu, Stephen

Faradonbeh, Mohamad Kazem Shirani, Tewari, Ambuj, Michailidis, George

We consider the classical problem of control of linear systems with quadratic cost. When the true system dynamics are unknown, an adaptive policy is required for learning the model parameters and planning a control policy simultaneously. Addressing this trade-off between accurate estimation and good control represents the main challenge in the area of adaptive control. Another important issue is to prevent the system becoming destabilized due to lack of knowledge of its dynamics. Asymptotically optimal approaches have been extensively studied in the literature, but there are very few non-asymptotic results which also do not provide a comprehensive treatment of the problem. In this work, we establish finite time high probability regret bounds that are optimal up to logarithmic factors. We also provide high probability guarantees for a stabilization algorithm based on random linear feedbacks. The results are obtained under very mild assumptions, requiring: (i) stabilizability of the matrices encoding the system's dynamics, and (ii) degree of heaviness of the noise distribution. To derive our results, we also introduce a number of new concepts and technical tools.

Adaptive optimal control using value iteration initiated from a stabilizing control policy is theoretically analyzed in terms of stability of the system during the learning stage without ignoring the effects of approximation errors. This analysis includes the system operated using any single/constant resulting control policy and also using an evolving/time-varying control policy. A feature of the presented results is providing estimations of the \textit{region of attraction} so that if the initial condition is within the region, the whole trajectory will remain inside it and hence, the function approximation results remain valid.

Amani, Elie, Djouani, Karim, Kurien, Anish, De Boer, Jean-Rémi, Vigneau, Willy, Ries, Lionel

Multipath is among the major sources of errors in precise positioning using GPS and continues to be extensively studied. Two Fast Fourier Transform (FFT)-based detectors are presented in this paper as GPS multipath detection techniques. The detectors are formulated as binary hypothesis tests under the assumption that the multipath exists for a sufficient time frame that allows its detection based on the quadrature arm of the coherent Early-minus-Late discriminator (Q EmL) for a scalar tracking loop (STL) or on the quadrature (Q EmL) and/or in-phase arm (I EmL) for a vector tracking loop (VTL), using an observation window of N samples. Performance analysis of the suggested detectors is done on multipath signal data acquired from the multipath environment simulator developed by the German Aerospace Centre (DLR) as well as on multipath data from real GPS signals. Application of the detection tests to correlator outputs of scalar and vector tracking loops shows that they may be used to exclude multipath contaminated satellites from the navigation solution. These detection techniques can be extended to other Global Navigation Satellite Systems (GNSS) such as GLONASS, Galileo and Beidou.

A prognostic system makes it possible to anticipate loss of functionality before it occurs with sufficient lead time to take actions that mitigate the impact of this loss. We focus on the forms of mitigation within the flight vehicle that influence the operational dynamics but do not directly amend the mission plan. Thus, we focus upon the reconfiguration of the feedback control strategy for the flight system. The high degree of complexity in the design and dynamics of modern aircraft is typically handled using a hierarchical control scheme in which there are several levels of control at increasing levels of responsibility: the component level, the subsystem level, and the system level. Our reconfiguration strategy involves mitigating problems that are detected at the component level at both the level in which the fault is detected and higher levels as well.

In this paper a hybrid (base) system is modelled as quintuple consisting of a state space (which is the direct product of a set of discrete states and an n-dimensional manifold), sets of admissible continuous and discrete controls, a family of controlled autonomous vector fields assigned to each discrete state, and a (partially defined) map of discrete transitions. Next, generalizing the theory presented in (Caines 8. Wei 1998), the notion a finite analytic partition II of a state space of a hybrid system is defined. Then the notion of dynamical consistency is generalized to that of hybrid dynamical consistency. Based on these notions, the partition machine H rl of a hybrid system H is defined in such a way that, in the class of in-block controllable partitions, the controllability of the high level system (described by the partition machine, which is a discrete finite state machine) is equivalent (under some technical conditions) to the controllability of the low level system (described by differential equations). Within the hybrid partition machine framework, a discrete controller supervises its continuous subsystems via hierarchical feedback relations; furthermore, each continuous subsystem is itself (internally) subject to feedback control.