A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory

Suematsu, Nobuo, Hayashi, Akira

Neural Information Processing Systems 

We have proved that the model learned by BLHT converges to the optimal model in given hypothesis space, 1{, which provides the most accurate predictions of percepts and rewards, given short-term memory. We believe this fact provides a solid basis for BLHT, and BLHT can be compared favorably with other methods using short-term memory.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found