Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations

Aug-16-2025, 10:42:27 GMT–Neural Information Processing Systems

There have been many recent advances on provably efficient Reinforcement Learning (RL) in problems with rich observation spaces. However, all these works share a strong realizability assumption about the optimal value function of the true MDP . Such realizability assumptions are often too strong to hold in practice. In this work, we consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies Π that may not contain any near-optimal policy. We provide an algorithm for this setting whose error is bounded in terms of the rank d of the underlying MDP .

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Aug-16-2025, 10:42:27 GMT

Conferences PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)
  - Israel > Tel Aviv District
    - Tel Aviv (0.04)

Genre:
- Research Report (0.45)

Industry:
- Leisure & Entertainment > Games (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.93)