Reviews: On Oracle-Efficient PAC RL with Rich Observations

May-26-2025, 06:47:09 GMT–Neural Information Processing Systems

Moreover, the reward depends only on x_t and the action, not the state S_t. They then correctly state (again, lines 99-100) that this makes the problem an MDP over X. It argues "The hidden states serve to introduce structure to the MDP and enable tractable learning." I don't understand why this is the case.

assumption, oracle-efficient pac rl, value function, (13 more...)

Neural Information Processing Systems

May-26-2025, 06:47:09 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.52)
  - Representation & Reasoning (0.51)