Dean, Sarah, Mania, Horia, Matni, Nikolai, Recht, Benjamin, Tu, Stephen

We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown linear system is controlled subject to quadratic costs. Leveraging recent developments in the estimation of linear systems and in robust controller synthesis, we present the first provably polynomial time algorithm that provides high probability guarantees of sub-linear regret on this problem. We further study the interplay between regret minimization and parameter estimation by proving a lower bound on the expected regret in terms of the exploration schedule used by any algorithm. Finally, we conduct a numerical study comparing our robust adaptive algorithm to other methods from the adaptive LQR literature, and demonstrate the flexibility of our proposed method by extending it to a demand forecasting problem subject to state constraints.

Freda, Luigi, Gianni, Mario, Pirri, Fiora

Abstract-- This work presents a methodology to design trajectory tracking feedback control laws, which embed nonparametric statistical models, such as Gaussian Processes (GPs). The aim is to minimize unmodeled dynamics such as undesired slippages. The proposed approach has the benefit of avoiding complex terramechanics analysis to directly estimate from data the robot dynamics on a wide class of trajectories. Experiments in both real and simulated environments prove that the proposed methodology is promising. In the last decades, an increasing interest has been devoted to the design of high performance path tracking. In the literature, three main approaches to face this problem have emerged: (i) model-based and adaptive control [1]-[5]; (ii) Gaussian Processes or stochastic nonlinear models for reinforcement learning of control policies [6], [7], and (iii) nominal models and data-driven estimation of the residual [8], [9].

Dean, Sarah, Mania, Horia, Matni, Nikolai, Recht, Benjamin, Tu, Stephen

Faradonbeh, Mohamad Kazem Shirani, Tewari, Ambuj, Michailidis, George

We consider the classical problem of control of linear systems with quadratic cost. When the true system dynamics are unknown, an adaptive policy is required for learning the model parameters and planning a control policy simultaneously. Addressing this trade-off between accurate estimation and good control represents the main challenge in the area of adaptive control. Another important issue is to prevent the system becoming destabilized due to lack of knowledge of its dynamics. Asymptotically optimal approaches have been extensively studied in the literature, but there are very few non-asymptotic results which also do not provide a comprehensive treatment of the problem. In this work, we establish finite time high probability regret bounds that are optimal up to logarithmic factors. We also provide high probability guarantees for a stabilization algorithm based on random linear feedbacks. The results are obtained under very mild assumptions, requiring: (i) stabilizability of the matrices encoding the system's dynamics, and (ii) degree of heaviness of the noise distribution. To derive our results, we also introduce a number of new concepts and technical tools.

Adaptive optimal control using value iteration initiated from a stabilizing control policy is theoretically analyzed in terms of stability of the system during the learning stage without ignoring the effects of approximation errors. This analysis includes the system operated using any single/constant resulting control policy and also using an evolving/time-varying control policy. A feature of the presented results is providing estimations of the \textit{region of attraction} so that if the initial condition is within the region, the whole trajectory will remain inside it and hence, the function approximation results remain valid.

Amani, Elie, Djouani, Karim, Kurien, Anish, De Boer, Jean-Rémi, Vigneau, Willy, Ries, Lionel

Multipath is among the major sources of errors in precise positioning using GPS and continues to be extensively studied. Two Fast Fourier Transform (FFT)-based detectors are presented in this paper as GPS multipath detection techniques. The detectors are formulated as binary hypothesis tests under the assumption that the multipath exists for a sufficient time frame that allows its detection based on the quadrature arm of the coherent Early-minus-Late discriminator (Q EmL) for a scalar tracking loop (STL) or on the quadrature (Q EmL) and/or in-phase arm (I EmL) for a vector tracking loop (VTL), using an observation window of N samples. Performance analysis of the suggested detectors is done on multipath signal data acquired from the multipath environment simulator developed by the German Aerospace Centre (DLR) as well as on multipath data from real GPS signals. Application of the detection tests to correlator outputs of scalar and vector tracking loops shows that they may be used to exclude multipath contaminated satellites from the navigation solution. These detection techniques can be extended to other Global Navigation Satellite Systems (GNSS) such as GLONASS, Galileo and Beidou.

A prognostic system makes it possible to anticipate loss of functionality before it occurs with sufficient lead time to take actions that mitigate the impact of this loss. We focus on the forms of mitigation within the flight vehicle that influence the operational dynamics but do not directly amend the mission plan. Thus, we focus upon the reconfiguration of the feedback control strategy for the flight system. The high degree of complexity in the design and dynamics of modern aircraft is typically handled using a hierarchical control scheme in which there are several levels of control at increasing levels of responsibility: the component level, the subsystem level, and the system level. Our reconfiguration strategy involves mitigating problems that are detected at the component level at both the level in which the fault is detected and higher levels as well.

In this paper a hybrid (base) system is modelled as quintuple consisting of a state space (which is the direct product of a set of discrete states and an n-dimensional manifold), sets of admissible continuous and discrete controls, a family of controlled autonomous vector fields assigned to each discrete state, and a (partially defined) map of discrete transitions. Next, generalizing the theory presented in (Caines 8. Wei 1998), the notion a finite analytic partition II of a state space of a hybrid system is defined. Then the notion of dynamical consistency is generalized to that of hybrid dynamical consistency. Based on these notions, the partition machine H rl of a hybrid system H is defined in such a way that, in the class of in-block controllable partitions, the controllability of the high level system (described by the partition machine, which is a discrete finite state machine) is equivalent (under some technical conditions) to the controllability of the low level system (described by differential equations). Within the hybrid partition machine framework, a discrete controller supervises its continuous subsystems via hierarchical feedback relations; furthermore, each continuous subsystem is itself (internally) subject to feedback control.

Artificial neural networks offer an attractive paradigm for the design of behavior and control systems in robots and autonomous agents for a variety of reasons, including: ability to adapt and learn, potential for resistance to noise, faults and component failures, potential for real-time performance in dynamic environments (through massive parallelism and suitable hardware realization) etc. However, designing a good neurocontroller for a given robotic application is an instance of a difficult multi-criterion optimization problem, requiring complicated tradeoffs among different, often competing measures of the network, like performance, cost, complexity etc., which is further compounded by competing objectives in the realization of behavior (e.g., move quickly versus avoid obstacles). Evolutionary Algorithms (EAs), simulated models of natural evolution, have been shown to be effective in searching several vast, complex, multi-modal, and deceptive search spaces. They are therefore viable candidates to employ in the design of neurocontrollers (Bal-akrishnan & Honavar 1995). Although this synergy of approaches is not new (see (Balakrishnan & Honavar 1995) for a bibliography), this field still offers many exciting avenues of research.

Learning algorithms have enjoyed numerous successes in robotic control tasks. In problems with time-varying dynamics, online learning methods have also proved to be a powerful tool for automatically tracking and/or adapting to the changing circumstances. However, for safety-critical applications suchas airplane flight, the adoption of these algorithms has been significantly hampered by their lack of safety, such as "stability," guarantees. Rather than trying to show difficult, a priori, stability guarantees forspecific learning methods, in this paper we propose a method for "monitoring" the controllers suggested by the learning algorithm online, andrejecting controllers leading to instability. We prove that even if an arbitrary online learning method is used with our algorithm to control a linear dynamical system, the resulting system is stable.