Reviews: PAC Reinforcement Learning with Rich Observations

Jan-20-2025, 07:52:31 GMT–Neural Information Processing Systems

Contextual MDPs are a specific type of POMDPs with the restriction that the optimal q-function depends only on the most recent observation (instead of the belief state). The authors show that Contextual MDPs are not poly PAC learneable even when either memoryless policies are considered or value function approximation is used. However, when both memoryless policies and value function approximation is used and the transitions are deterministic, then the model is PAC learnable in a polynomial number of episodes (and the complexity is independent of the number of observations). The paper is well written overall. The proofs are quite clear and quite thorough. I am not quite sure that the 16 pages of technical proofs in the appendix are suitable for a conference; the paper may better fit a journal format.

contextual mdp, mdp, value function approximation, (10 more...)

Neural Information Processing Systems

Jan-20-2025, 07:52:31 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.40)
    - Reinforcement Learning (0.40)
  - Representation & Reasoning (0.80)