Goto

Collaborating Authors

 continuous deep q-learning


Continuous Deep Q-Learning in Optimal Control Problems: Normalized Advantage Functions Analysis

Neural Information Processing Systems

One of the most effective continuous deep reinforcement learning algorithms is normalized advantage functions (NAF). The main idea of NAF consists in the approximation of the Q-function by functions quadratic with respect to the action variable. This idea allows to apply the algorithm to continuous reinforcement learning problems, but on the other hand, it brings up the question of classes of problems in which this approximation is acceptable. The presented paper describes one such class. We consider reinforcement learning problems obtained by the discretization of certain optimal control problems. Based on the idea of NAF, we present a new family of quadratic functions and prove its suitable approximation properties. Taking these properties into account, we provide several ways to improve NAF. The experimental results confirm the efficiency of our improvements.


Continuous Deep Q-Learning in Optimal Control Problems: Normalized Advantage Functions Analysis

Neural Information Processing Systems

One of the most effective continuous deep reinforcement learning algorithms is normalized advantage functions (NAF). The main idea of NAF consists in the approximation of the Q-function by functions quadratic with respect to the action variable. This idea allows to apply the algorithm to continuous reinforcement learning problems, but on the other hand, it brings up the question of classes of problems in which this approximation is acceptable. The presented paper describes one such class. We consider reinforcement learning problems obtained by the discretization of certain optimal control problems.


Networked Control of Nonlinear Systems under Partial Observation Using Continuous Deep Q-Learning

Ikemoto, Junya, Ushio, Toshimitsu

arXiv.org Machine Learning

In this paper, we propose a design of a model-free networked controller for a nonlinear plant whose mathematical model is unknown. In a networked control system, the controller and plant are located away from each other and exchange data over a network, which causes network delays that may fluctuate randomly due to network routing. So, in this paper, we assume that the current network delay is not known but the maximum value of fluctuating network delays is known beforehand. Moreover, we also assume that the sensor cannot observe all state variables of the plant. Under these assumption, we apply continuous deep Q-learning to the design of the networked controller. Then, we introduce an extended state consisting of a sequence of past control inputs and outputs as inputs to the deep neural network. By simulation, it is shown that, using the extended state, the controller can learn a control policy robust to the fluctuation of the network delays under the partial observation.


Model-free Control of Chaos with Continuous Deep Q-learning

Ikemoto, Junya, Ushio, Toshimitsu

arXiv.org Machine Learning

The OGY method is one of control methods for a chaotic system. In the method, we have to calculate a stabilizing periodic orbit embedded in its chaotic attractor. Thus, we cannot use this method in the case where a precise mathematical model of the chaotic system cannot be identified. In this case, the delayed feedback control proposed by Pyragas is useful. However, even in the delayed feedback control, we need the mathematical model to determine a feedback gain that stabilizes the periodic orbit. To overcome this problem, we propose a model-free reinforcement learning algorithm to the design of a controller for the chaotic system. In recent years, model-free reinforcement learning algorithms with deep neural networks have been paid much attention to. Those algorithms make it possible to control complex systems. However, it is known that model-free reinforcement learning algorithms are not efficient because learners must explore their control policies over the entire state space. Moreover, model-free reinforcement learning algorithms with deep neural networks have the disadvantage in taking much time to learn their control optimal policies. Thus, we propose a data-based control policy consisting of two steps, where we determine a region including the stabilizing periodic orbit first, and make the controller learn an optimal control policy for its stabilization. In the proposed method, the controller efficiently explores its control policy only in the region.