A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory

Suematsu, Nobuo, Hayashi, Akira

Neural Information Processing Systems 

Since BLHT learns a stochastic model based on Bayesian Learning, the overfitting problemis reasonably solved. Moreover, BLHT has an efficient implementation. This paper shows that the model learned by BLHT converges toone which provides the most accurate predictions of percepts and rewards, given short-term memory. 1 INTRODUCTION Research on Reinforcement Learning (RL) problem forpartially observable environments is gaining more attention recently. This is mainly because the assumption that perfect and complete perception of the state of the environment is available for the learning agent, which many previous RL algorithms require, is not valid for many realistic environments.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found