Periodic agent-state based Q-learning for POMDPs
–Neural Information Processing Systems
The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP . However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy.
Neural Information Processing Systems
Oct-10-2025, 05:54:34 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe
- North America
- Canada
- Alberta > Census Division No. 15
- Improvement District No. 9 > Banff (0.04)
- Quebec > Montreal (0.04)
- Alberta > Census Division No. 15
- United States
- California > San Francisco County
- San Francisco (0.14)
- Massachusetts > Hampshire County
- Amherst (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- California > San Francisco County
- Canada
- Asia > Middle East
- Genre:
- Research Report > Experimental Study (1.00)