Goto

Collaborating Authors

 Ishii, Shin


Reinforcement Learning Based on On-Line EM Algorithm

Neural Information Processing Systems

On the other hand, applications to continuous state/action problems (Werbos, 1990; Doya, 1996; Sofge & White, 1992) are much more difficult than the finite state/action cases. Good function approximation methods and fast learning algorithms are crucial for successful applications. In this article, we propose a new RL method that has the above-mentioned two features. This method is based on an actor-critic architecture (Barto et al., 1983), although the detailed implementations of the actor and the critic are quite differ- Reinforcement Learning Based on On-Line EM Algorithm 1053 ent from those in the original actor-critic model. The actor and the critic in our method estimate a policy and a Q-function, respectively, and are approximated by Normalized Gaussian Networks (NGnet) (l'doody & Darken, 1989).


Reinforcement Learning Based on On-Line EM Algorithm

Neural Information Processing Systems

The actor and the critic are approximated by Normalized Gaussian Networks (NGnet), which are networks of local linear regression units. The NGnet is trained by the online EM algorithm proposed in our previous paper.We apply our RL method to the task of swinging-up and stabilizing a single pendulum and the task of balancing a double pendulumnear the upright position.