Periodic agent-state based Q-learning for POMDPs

Oct-10-2025, 05:54:34 GMT–Neural Information Processing Systems

The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP . However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy.

agent state, asql, markov chain, (15 more...)

Neural Information Processing Systems

Oct-10-2025, 05:54:34 GMT

Conferences PDF

Add feedback

Country:
- North America
  - United States
    - Wisconsin > Dane County
      - Madison (0.04)
    - Massachusetts > Hampshire County
      - Amherst (0.04)
    - California > San Francisco County
      - San Francisco (0.14)
  - Canada
    - Quebec > Montreal (0.04)
    - Alberta > Census Division No. 15
      - Improvement District No. 9 > Banff (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - France > Hauts-de-France
    - Nord > Lille (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (1.00)

Duplicate Docs Excel Report

Title
71b52a5b3fe2e9303433a174b60e160d-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found