RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

Oct-10-2025, 10:28:43 GMT–Neural Information Processing Systems

We introduce the first sample-efficient algorithm for LMDPs without any additional distributional assumptions . Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments.

algorithm, international conference, lmdp, (13 more...)

Neural Information Processing Systems

Oct-10-2025, 10:28:43 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - Wisconsin > Dane County
    - Madison (0.04)
  - Texas > Travis County
    - Austin (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Israel (0.04)

Genre:
- Research Report
  - New Finding (0.66)
  - Experimental Study (0.46)

Industry:
- Health & Medicine (0.67)
- Energy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty (0.67)
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (1.00)

Duplicate Docs Excel Report

Title
96bbdd0ed2a9e7cd2fb7caf2fae15f3d-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found