Reinforcement Learning Based on On-Line EM Algorithm

Sato, Masa-aki, Ishii, Shin

Neural Information Processing Systems 

The actor and the critic are approximated by Normalized Gaussian Networks (NGnet), which are networks of local linear regression units. The NGnet is trained by the online EM algorithm proposed in our previous paper.We apply our RL method to the task of swinging-up and stabilizing a single pendulum and the task of balancing a double pendulumnear the upright position. The experimental results show that our RL method can be applied to optimal control problems havingcontinuous state/action spaces and that the method achieves good control with a small number of trial-and-errors. 1 INTRODUCTION Reinforcement learning (RL) methods (Barto et al., 1990) have been successfully applied to various Markov decision problems having finite state/action spaces, such as the backgammon game (Tesauro, 1992) and a complex task in a dynamic environment (Lin,1992). On the other hand, applications to continuous state/action problems (Werbos, 1990; Doya, 1996; Sofge & White, 1992) are much more difficult than the finite state/action cases. Good function approximation methods and fast learning algorithms are crucial for successful applications.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found