In this paper, we propose a design of a model-free networked controller for a nonlinear plant whose mathematical model is unknown. In a networked control system, the controller and plant are located away from each other and exchange data over a network, which causes network delays that may fluctuate randomly due to network routing. So, in this paper, we assume that the current network delay is not known but the maximum value of fluctuating network delays is known beforehand. Moreover, we also assume that the sensor cannot observe all state variables of the plant. Under these assumption, we apply continuous deep Q-learning to the design of the networked controller. Then, we introduce an extended state consisting of a sequence of past control inputs and outputs as inputs to the deep neural network. By simulation, it is shown that, using the extended state, the controller can learn a control policy robust to the fluctuation of the network delays under the partial observation.
The problem of resource allocation of nonlinear networked control systems is investigated, where, unlike the well discussed case of triggering for stability, the objective is optimal triggering. An approximate dynamic programming approach is developed for solving problems with fixed final times initially and then it is extended to infinite horizon problems. Different cases including Zero-Order-Hold, Generalized Zero-Order-Hold, and stochastic networks are investigated. Afterwards, the developments are extended to the case of problems with unknown dynamics and a model-free scheme is presented for learning the (approximate) optimal solution. After detailed analyses of convergence, optimality, and stability of the results, the performance of the method is demonstrated through different numerical examples.
We present a principled and efficient planning algorithm for collaborative multiagent dynamical systems. All computation, during both the planning and the execution phases, is distributed among the agents; each agent only needs to model and plan for a small part of the system. Each of these local subsystems is small, but once they are combined they can represent an exponentially larger problem. The subsystems are connected through a subsystem hierarchy. Coordination and communication between the agents is not imposed, but derived directly from the structure of this hierarchy. A globally consistent plan is achieved by a message passing algorithm, where messages correspond to natural local reward functions and are computed by local linear programs; another message passing algorithm allows us to execute the resulting policy. When two portions of the hierarchy share the same structure, our algorithm can reuse plans and messages to speed up computation.
Wireless Internet access has brought legions of heterogeneous applications all sharing the same resources. However, current wireless edge networks that cater to worst or average case performance lack the agility to best serve these diverse sessions. Simultaneously, software reconfigurable infrastructure has become increasingly mainstream to the point that dynamic per packet and per flow decisions are possible at multiple layers of the communications stack. Exploiting such reconfigurability requires the design of a system that can enable a configuration, measure the impact on the application performance (Quality of Experience), and adaptively select a new configuration. Effectively, this feedback loop is a Markov Decision Process whose parameters are unknown. The goal of this work is to design, develop and demonstrate QFlow that instantiates this feedback loop as an application of reinforcement learning (RL). Our context is that of reconfigurable (priority) queueing, and we use the popular application of video streaming as our use case. We develop both model-free and model-based RL approaches that are tailored to the problem of determining which clients should be assigned to which queue at each decision period. Through experimental validation, we show how the RL-based control policies on QFlow are able to schedule the right clients for prioritization in a high-load scenario to outperform the status quo, as well as the best known solutions with over 25% improvement in QoE, and a perfect QoE score of 5 over 85% of the time.
An important problem for the Internet is how to provide a guaranteed quality of service to users, in contrast to the current "best-effort" service. A key aspect of this problem is how routers should share network capacity between different classes of traffic. This decision needs to be made for each incoming packet, and is known as the packet scheduling problem. A major challenge in packet scheduling is that the behaviour of each traffic class may not be known in advance, and can vary dynamically. In this paper, we describe how we have modelled the packet scheduling problem as an application for reinforcement learning (RL). We demonstrate how our RL approach can learn scheduling policies that satisfy the quality of service requirements of multiple traffic classes under a variety of conditions. We also present an insight into the effectiveness of two different RL algorithms in this context. A major benefit of this approach is that we can help network providers deliver a guaranteed quality of service to customers without manual fine-tuning of the network routers.