Goto

Collaborating Authors

Results


A Reinforcement Learning-based Adaptive Control Model for Future Street Planning, An Algorithm and A Case Study

arXiv.org Artificial Intelligence

With the emerging technologies in Intelligent Transportation System (ITS), the adaptive operation of road space is likely to be realised within decades. An intelligent street can learn and improve its decision-making on the right-of-way (ROW) for road users, liberating more active pedestrian space while maintaining traffic safety and efficiency. However, there is a lack of effective controlling techniques for these adaptive street infrastructures. To fill this gap in existing studies, we formulate this control problem as a Markov Game and develop a solution based on the multi-agent Deep Deterministic Policy Gradient (MADDPG) algorithm. The proposed model can dynamically assign ROW for sidewalks, autonomous vehicles (AVs) driving lanes and on-street parking areas in real-time. Integrated with the SUMO traffic simulator, this model was evaluated using the road network of the South Kensington District against three cases of divergent traffic conditions: pedestrian flow rates, AVs traffic flow rates and parking demands. Results reveal that our model can achieve an average reduction of 3.87% and 6.26% in street space assigned for on-street parking and vehicular operations. Combined with space gained by limiting the number of driving lanes, the average proportion of sidewalks to total widths of streets can significantly increase by 10.13%.


Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control

arXiv.org Artificial Intelligence

In this paper we aim to provide analysis and insights (often based on visualization), which explain the beneficial effects of on-line decision making on top of off-line training. In particular, through a unifying abstract mathematical framework, we show that the principal AlphaZero/TD-Gammon ideas of approximation in value space and rollout apply very broadly to deterministic and stochastic optimal control problems, involving both discrete and continuous search spaces. Moreover, these ideas can be effectively integrated with other important methodologies such as model predictive control, adaptive control, decentralized control, discrete and Bayesian optimization, neural network-based value and policy approximations, and heuristic algorithms for discrete optimization.


Stochastic Deep Model Reference Adaptive Control

arXiv.org Artificial Intelligence

In this paper, we present a Stochastic Deep Neural Network-based Model Reference Adaptive Control. Building on our work "Deep Model Reference Adaptive Control", we extend the controller capability by using Bayesian deep neural networks (DNN) to represent uncertainties and model non-linearities. Stochastic Deep Model Reference Adaptive Control uses a Lyapunov-based method to adapt the output-layer weights of the DNN model in real-time, while a data-driven supervised learning algorithm is used to update the inner-layers parameters. This asynchronous network update ensures boundedness and guaranteed tracking performance with a learning-based real-time feedback controller. A Bayesian approach to DNN learning helped avoid over-fitting the data and provide confidence intervals over the predictions. The controller's stochastic nature also ensured "Induced Persistency of excitation," leading to convergence of the overall system signal.


Distributed Adaptive Control: An ideal Cognitive Architecture candidate for managing a robotic recycling plant

arXiv.org Artificial Intelligence

In the past decade, society has experienced notable growth in a variety of technological areas. However, the Fourth Industrial Revolution has not been embraced yet. Industry 4.0 imposes several challenges which include the necessity of new architectural models to tackle the uncertainty that open environments represent to cyber-physical systems (CPS). Waste Electrical and Electronic Equipment (WEEE) recycling plants stand for one of such open environments. Here, CPSs must work harmoniously in a changing environment, interacting with similar and not so similar CPSs, and adaptively collaborating with human workers. In this paper, we support the Distributed Adaptive Control (DAC) theory as a suitable Cognitive Architecture for managing a recycling plant. Specifically, a recursive implementation of DAC (between both singleagent and large-scale levels) is proposed to meet the expected demands of the European Project HR-Recycler. Additionally, with the aim of having a realistic benchmark for future implementations of the recursive DAC, a micro-recycling plant prototype is presented. Keywords: Cognitive Architecture, Distributed Adaptive Control, Recycling Plant, Navigation, Motor Control, Human-Robot Interaction.


Multifunction Cognitive Radar Task Scheduling Using Monte Carlo Tree Search and Policy Networks

arXiv.org Artificial Intelligence

A modern radar may be designed to perform multiple functions, such as surveillance, tracking, and fire control. Each function requires the radar to execute a number of transmit-receive tasks. A radar resource management (RRM) module makes decisions on parameter selection, prioritization, and scheduling of such tasks. RRM becomes especially challenging in overload situations, where some tasks may need to be delayed or even dropped. In general, task scheduling is an NP-hard problem. In this work, we develop the branch-and-bound (B&B) method which obtains the optimal solution but at exponential computational complexity. On the other hand, heuristic methods have low complexity but provide relatively poor performance. We resort to machine learning-based techniques to address this issue; specifically we propose an approximate algorithm based on the Monte Carlo tree search method. Along with using bound and dominance rules to eliminate nodes from the search tree, we use a policy network to help to reduce the width of the search. Such a network can be trained using solutions obtained by running the B&B method offline on problems with feasible complexity. We show that the proposed method provides near-optimal performance, but with computational complexity orders of magnitude smaller than the B&B algorithm.


Stability Analysis of Optimal Adaptive Control using Value Iteration with Approximation Errors

arXiv.org Machine Learning

Adaptive optimal control using value iteration initiated from a stabilizing control policy is theoretically analyzed in terms of stability of the system during the learning stage without ignoring the effects of approximation errors. This analysis includes the system operated using any single/constant resulting control policy and also using an evolving/time-varying control policy. A feature of the presented results is providing estimations of the \textit{region of attraction} so that if the initial condition is within the region, the whole trajectory will remain inside it and hence, the function approximation results remain valid.


Stochastic processes and feedback-linearisation for online identification and Bayesian adaptive control of fully-actuated mechanical systems

arXiv.org Machine Learning

This work proposes a new method for simultaneous probabilistic identification and control of an observable, fully-actuated mechanical system. Identification is achieved by conditioning stochastic process priors on observations of configurations and noisy estimates of configuration derivatives. In contrast to previous work that has used stochastic processes for identification, we leverage the structural knowledge afforded by Lagrangian mechanics and learn the drift and control input matrix functions of the control-affine system separately. We utilise feedback-linearisation to reduce, in expectation, the uncertain nonlinear control problem to one that is easy to regulate in a desired manner. Thereby, our method combines the flexibility of nonparametric Bayesian learning with epistemological guarantees on the expected closed-loop trajectory. We illustrate our method in the context of torque-actuated pendula where the dynamics are learned with a combination of normal and log-normal processes.