RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
–Neural Information Processing Systems
We introduce the first sample-efficient algorithm for LMDPs without any additional distributional assumptions . Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments.
Neural Information Processing Systems
Oct-10-2025, 10:28:43 GMT
- Country:
- Asia > Middle East
- Israel (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Texas > Travis County
- Austin (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Texas > Travis County
- Asia > Middle East
- Genre:
- Research Report
- Experimental Study (0.46)
- New Finding (0.66)
- Research Report
- Industry:
- Energy (0.46)
- Health & Medicine (0.67)