to

### Meta-Adaptive Nonlinear Control: Theory and Algorithms

We present an online multi-task learning approach for adaptive nonlinear control, which we call Online Meta-Adaptive Control (OMAC). The goal is to control a nonlinear system subject to adversarial disturbance and unknown $\textit{environment-dependent}$ nonlinear dynamics, under the assumption that the environment-dependent dynamics can be well captured with some shared representation. Our approach is motivated by robot control, where a robotic system encounters a sequence of new environmental conditions that it must quickly adapt to. A key emphasis is to integrate online representation learning with established methods from control theory, in order to arrive at a unified framework that yields both control-theoretic and learning-theoretic guarantees. We provide instantiations of our approach under varying conditions, leading to the first non-asymptotic end-to-end convergence guarantee for multi-task adaptive nonlinear control. OMAC can also be integrated with deep representation learning. Experiments show that OMAC significantly outperforms conventional adaptive control approaches which do not learn the shared representation.

### Reinforcement learning for linear-convex models with jumps via stability analysis of feedback controls

We study finite-time horizon continuous-time linear-convex reinforcement learning problems in an episodic setting. In this problem, the unknown linear jump-diffusion process is controlled subject to nonsmooth convex costs. We show that the associated linear-convex control problems admit Lipchitz continuous optimal feedback controls and further prove the Lipschitz stability of the feedback controls, i.e., the performance gap between applying feedback controls for an incorrect model and for the true model depends Lipschitz-continuously on the magnitude of perturbations in the model coefficients; the proof relies on a stability analysis of the associated forward-backward stochastic differential equation. We then propose a novel least-squares algorithm which achieves a regret of the order $O(\sqrt{N\ln N})$ on linear-convex learning problems with jumps, where $N$ is the number of learning episodes; the analysis leverages the Lipschitz stability of feedback controls and concentration properties of sub-Weibull random variables.

### A Regret Minimization Approach to Iterative Learning Control

We consider the setting of iterative learning control, or model-based policy learning in the presence of uncertain, time-varying dynamics. In this setting, we propose a new performance metric, planning regret, which replaces the standard stochastic uncertainty assumptions with worst case regret. Based on recent advances in non-stochastic control, we design a new iterative algorithm for minimizing planning regret that is more robust to model mismatch and uncertainty. We provide theoretical and empirical evidence that the proposed algorithm outperforms existing methods on several benchmarks.

### Learning-based vs Model-free Adaptive Control of a MAV under Wind Gust

Navigation problems under unknown varying conditions are among the most important and well-studied problems in the control field. Classic model-based adaptive control methods can be applied only when a convenient model of the plant or environment is provided. Recent model-free adaptive control methods aim at removing this dependency by learning the physical characteristics of the plant and/or process directly from sensor feedback. Although there have been prior attempts at improving these techniques, it remains an open question as to whether it is possible to cope with real-world uncertainties in a control system that is fully based on either paradigm. We propose a conceptually simple learning-based approach composed of a full state feedback controller, tuned robustly by a deep reinforcement learning framework based on the Soft Actor-Critic algorithm. We compare it, in realistic simulations, to a model-free controller that uses the same deep reinforcement learning framework for the control of a micro aerial vehicle under wind gust. The results indicate the great potential of learning-based adaptive control methods in modern dynamical systems.

### Safe and Efficient Model-free Adaptive Control via Bayesian Optimization

Adaptive control approaches yield high-performance controllers when a precise system model or suitable parametrizations of the controller are available. Existing data-driven approaches for adaptive control mostly augment standard model-based methods with additional information about uncertainties in the dynamics or about disturbances. In this work, we propose a purely data-driven, model-free approach for adaptive control. Tuning low-level controllers based solely on system data raises concerns on the underlying algorithm safety and computational performance. Thus, our approach builds on GoOSE, an algorithm for safe and sample-efficient Bayesian optimization. We introduce several computational and algorithmic modifications in GoOSE that enable its practical use on a rotational motion system. We numerically demonstrate for several types of disturbances that our approach is sample efficient, outperforms constrained Bayesian optimization in terms of safety, and achieves the performance optima computed by grid evaluation. We further demonstrate the proposed adaptive control approach experimentally on a rotational motion system.

### Distributed Adaptive Control: An ideal Cognitive Architecture candidate for managing a robotic recycling plant

In the past decade, society has experienced notable growth in a variety of technological areas. However, the Fourth Industrial Revolution has not been embraced yet. Industry 4.0 imposes several challenges which include the necessity of new architectural models to tackle the uncertainty that open environments represent to cyber-physical systems (CPS). Waste Electrical and Electronic Equipment (WEEE) recycling plants stand for one of such open environments. Here, CPSs must work harmoniously in a changing environment, interacting with similar and not so similar CPSs, and adaptively collaborating with human workers. In this paper, we support the Distributed Adaptive Control (DAC) theory as a suitable Cognitive Architecture for managing a recycling plant. Specifically, a recursive implementation of DAC (between both singleagent and large-scale levels) is proposed to meet the expected demands of the European Project HR-Recycler. Additionally, with the aim of having a realistic benchmark for future implementations of the recursive DAC, a micro-recycling plant prototype is presented. Keywords: Cognitive Architecture, Distributed Adaptive Control, Recycling Plant, Navigation, Motor Control, Human-Robot Interaction.

### Machine learning approach could improve radar in congested environments - Military Embedded Systems

Research being conducted by the U.S. Army Combat Capabilities Development Command (DEVCOM) is focused on a new machine learning approach that could improve radar performance in congested environments. Researchers from DEVCOM, Army Research Laboratory, and Virginia Tech have developed an automatic way for radars to operate in congested and limited-spectrum environments created by commercial 4G LTE and future 5G communications systems. The researchers claim they examined how future Department of Defense radar systems will share the spectrum with commercial communications systems. The team used machine learning to learn the behavior of ever-changing interference in the spectrum and find clean spectrum to maximize the radar performance. Once clean spectrum is identified, waveforms can be modified to best fit into the spectrum.

### Go with the Flow: Adaptive Control for Neural ODEs

Despite their elegant formulation and lightweight memory cost, neural ordinary differential equations (NODEs) suffer from known representational limitations. In particular, the single flow learned by NODEs cannot express all homeomorphisms from a given data space to itself, and their static weight parametrization restricts the type of functions they can learn compared to discrete architectures with layer-dependent weights. Here, we describe a new module called neurally-controlled ODE (N-CODE) designed to improve the expressivity of NODEs. The parameters of N-CODE modules are dynamic variables governed by a trainable map from initial or current activation state, resulting in forms of open-loop and closed-loop control, respectively. A single module is sufficient for learning a distribution on non-autonomous flows that adaptively drive neural representations. We provide theoretical and empirical evidence that N-CODE circumvents limitations of previous models and show how increased model expressivity manifests in several domains. In supervised learning, we demonstrate that our framework achieves better performance than NODEs as measured by both training speed and testing accuracy. In unsupervised learning, we apply this control perspective to an image Autoencoder endowed with a latent transformation flow, greatly improving representational power over a vanilla model and leading to state-of-the-art image reconstruction on CIFAR-10.

### Complementary Meta-Reinforcement Learning for Fault-Adaptive Control

Faults are endemic to all systems. Adaptive fault-tolerant control maintains degraded performance when faults occur as opposed to unsafe conditions or catastrophic events. In systems with abrupt faults and strict time constraints, it is imperative for control to adapt quickly to system changes to maintain system operations. We present a meta-reinforcement learning approach that quickly adapts its control policy to changing conditions. The approach builds upon model-agnostic meta learning (MAML). The controller maintains a complement of prior policies learned under system faults. This "library" is evaluated on a system after a new fault to initialize the new policy. This contrasts with MAML, where the controller derives intermediate policies anew, sampled from a distribution of similar systems, to initialize a new policy. Our approach improves sample efficiency of the reinforcement learning process. We evaluate our approach on an aircraft fuel transfer system under abrupt faults.

### Explore More and Improve Regret in Linear Quadratic Regulators

Stabilizing the unknown dynamics of a control system and minimizing regret in control of an unknown system are among the main goals in control theory and reinforcement learning. In this work, we pursue both these goals for adaptive control of linear quadratic regulators (LQR). Prior works accomplish either one of these goals at the cost of the other one. The algorithms that are guaranteed to find a stabilizing controller suffer from high regret, whereas algorithms that focus on achieving low regret assume the presence of a stabilizing controller at the early stages of agent-environment interaction. In the absence of such a stabilizing controller, at the early stages, the lack of reasonable model estimates needed for (i) strategic exploration and (ii) design of controllers that stabilize the system, results in regret that scales exponentially in the problem dimensions. We propose a framework for adaptive control that exploits the characteristics of linear dynamical systems and deploys additional exploration in the early stages of agent-environment interaction to guarantee sooner design of stabilizing controllers. We show that for the classes of controllable and stabilizable LQRs, where the latter is a generalization of prior work, these methods achieve $\tilde{\mathcal{O}}(\sqrt{T})$ regret with a polynomial dependence in the problem dimensions.