A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory
Suematsu, Nobuo, Hayashi, Akira
–Neural Information Processing Systems
We have proved that the model learned by BLHT converges to the optimal model in given hypothesis space, 1{, which provides the most accurate predictions of percepts and rewards, given short-term memory. We believe this fact provides a solid basis for BLHT, and BLHT can be compared favorably with other methods using short-term memory.
Neural Information Processing Systems
Dec-31-1999
- Country:
- Asia
- Middle East > Jordan (0.05)
- Japan > Honshū
- Chūgoku > Hiroshima Prefecture > Hiroshima (0.05)
- Asia