Collaborating Authors

Stochastic processes and feedback-linearisation for online identification and Bayesian adaptive control of fully-actuated mechanical systems Machine Learning

This work proposes a new method for simultaneous probabilistic identification and control of an observable, fully-actuated mechanical system. Identification is achieved by conditioning stochastic process priors on observations of configurations and noisy estimates of configuration derivatives. In contrast to previous work that has used stochastic processes for identification, we leverage the structural knowledge afforded by Lagrangian mechanics and learn the drift and control input matrix functions of the control-affine system separately. We utilise feedback-linearisation to reduce, in expectation, the uncertain nonlinear control problem to one that is easy to regulate in a desired manner. Thereby, our method combines the flexibility of nonparametric Bayesian learning with epistemological guarantees on the expected closed-loop trajectory. We illustrate our method in the context of torque-actuated pendula where the dynamics are learned with a combination of normal and log-normal processes.

Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems Machine Learning

One of the core challenges in the field of control theory and reinforcement learning is adaptive control. It is the problem of controlling dynamical systems when the dynamics of the systems are unknown to the decision-making agents. In adaptive control, agents interact with given systems in order to explore and control them while the long-term objective is to minimize the overall average associated costs. The agent has to balance between exploration and exploitation, learn the dynamics, strategize for further exploration, and exploit the estimation to minimize the overall costs. The sequential nature of agent-system interaction results in challenges in the system identifying, estimation, and control under uncertainty, and these challenges are magnified when the systems are partially observable, i.e. contain hidden underlying dynamics. In the linear systems, when the underlying dynamics are fully observable, the asymptotic optimality of estimation methods has been the topic of study in the last decades [Lai et al., 1982, Lai and Wei, 1987]. Recently, novel techniques and learning algorithms have been developed to study the finite-time behavior of adaptive control algorithms and shed light on the design of optimal methods [Peña et al., 2009, Fiechter, 1997, Abbasi-Yadkori and Szepesvári, 2011]. In particular, Abbasi-Yadkori and Szepesvári [2011] proposes to use the principle of optimism in the face of uncertainty (OFU) to balance exploration and exploitation in LQR, where the state of the system is observable.

Stability Analysis of Optimal Adaptive Control using Value Iteration with Approximation Errors Machine Learning

Adaptive optimal control using value iteration initiated from a stabilizing control policy is theoretically analyzed in terms of stability of the system during the learning stage without ignoring the effects of approximation errors. This analysis includes the system operated using any single/constant resulting control policy and also using an evolving/time-varying control policy. A feature of the presented results is providing estimations of the \textit{region of attraction} so that if the initial condition is within the region, the whole trajectory will remain inside it and hence, the function approximation results remain valid.

When Edge Meets Learning: Adaptive Control for Resource-Constrained Distributed Machine Learning Machine Learning

Emerging technologies and applications including Internet of Things (IoT), social networking, and crowd-sourcing generate large amounts of data at the network edge. Machine learning models are often built from the collected data, to enable the detection, classification, and prediction of future events. Due to bandwidth, storage, and privacy concerns, it is often impractical to send all the data to a centralized location. In this paper, we consider the problem of learning model parameters from data distributed across multiple edge nodes, without sending raw data to a centralized place. Our focus is on a generic class of machine learning models that are trained using gradient-descent based approaches. We analyze the convergence rate of distributed gradient descent from a theoretical point of view, based on which we propose a control algorithm that determines the best trade-off between local update and global parameter aggregation to minimize the loss function under a given resource budget. The performance of the proposed algorithm is evaluated via extensive experiments with real datasets, both on a networked prototype system and in a larger-scale simulated environment. The experimentation results show that our proposed approach performs near to the optimum with various machine learning models and different data distributions.

The Distributed Adaptive Control Theory of the Mind and Brain as a candidate Standard Model of the Human Mind

AAAI Conferences

This article presents the Distributed Adaptive Control (DAC) theory of mind and brain as a candidate standard model of the human mind. DAC is defined against a reformulation of the criteria for unified theories of cognition advanced by Allen Newell, or the Unified Theories of Embodied Minds – Standard Model benchmark (UTEM-SM) that emphasizes real-world and real-time embodied action. DAC considers mind and brain as the function and implementation of a multi-layered control system and addresses the fundamental question of how the mind, as the product of embodied and situated brains, can obtain, retain and express valid knowledge of its world and transform this into policies for action. DAC provides an explanatory framework for biological minds and brains by satisfying well-defined constraints faced by theories of mind and brain and provides a route for the convergent validation of anatomy, physiology, and behavior in our explanation of biological minds. DAC is a well validated integration and synthesis framework for artificial minds and exemplifies the role of the synthetic method in understanding mind and brain. This article describes the core components of DAC, its performance on specific benchmarks derived from the engagement with the physical and the social world (or the H4W and the H5W problems) and lastly analyzes DAC’s performance on the UTEM-SM benchmark and its relationship with contemporary developments in AI.