A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory
–Neural Information Processing Systems
We describe a Reinforcement Learning algorithm for partially observ(cid:173) able environments using short-term memory, which we call BLHT. Since BLHT learns a stochastic model based on Bayesian Learning, the over(cid:173) fitting problem is reasonably solved. Moreover, BLHT has an efficient implementation. This paper shows that the model learned by BLHT con(cid:173) verges to one which provides the most accurate predictions of percepts and rewards, given short-term memory.
Neural Information Processing Systems
Apr-6-2023, 17:29:07 GMT
- Technology: