AITopics | lmdp

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

Neural Information Processing SystemsMar-21-2026, 17:06:59 GMT

In many real-world decision problems there is partially observed, hidden or latent information that remains fixed throughout an interaction. Such decision problems can be modeled as Latent Markov Decision Processes (LMDPs), where a latent variable is selected at the beginning of an interaction and is not disclosed to the agent initially. In last decade, there has been significant progress in designing learning algorithms for solving LMDPs under different structural assumptions. However, for general LMDPs, there is no known learning algorithm that provably matches the existing lower bound. We effectively resolve this open question, introducing the first sample-efficient algorithm for LMDPs without . Our result builds off a new perspective on the role off-policy evaluation guarantees and coverage coefficient in LMDPs, a perspective, which has been overlooked in the context of exploration in partially observed environments. Specifically, we establish a novel off-policy evaluation lemma and introduce a new coverage coefficient for LMDPs. Then, we show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm. These results, we believe, can be valuable for a wide range of interactive learning problems beyond the LMDP class, and especially, for partially observed environments.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Industry: Education (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

96bbdd0ed2a9e7cd2fb7caf2fae15f3d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 19:28:31 GMT

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine (0.67)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

LowerBound

Neural Information Processing SystemsFeb-11-2026, 05:41:36 GMT

Then, we consider sufficient assumptions under which learning good policies requires polynomial number of episodes.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Neural Information Processing SystemsDec-24-2025, 22:28:30 GMT

In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP). In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. We first show that a general instance of LMDPs requires at least $\Omega((SA)^M)$ episodes to even approximate the optimal policy. Then, we consider sufficient assumptions under which learning good policies requires polynomial number of episodes. We show that the key link is a notion of separation between the MDP system dynamics. With sufficient separation, we provide an efficient algorithm with local guarantee, {\it i.e.,} providing a sublinear regret guarantee when we are given a good initialization. Finally, if we are given standard statistical sufficiency assumptions common in the Predictive State Representation (PSR) literature (e.g., \cite{boots2011online}) and a reachability assumption, we show that the need for initialization can be removed.

latent mdp, name change, regret guarantee, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

Neural Information Processing SystemsOct-10-2025, 10:28:43 GMT

We introduce the first sample-efficient algorithm for LMDPs without any additional distributional assumptions . Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments.

algorithm, international conference, lmdp, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine (0.67)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

42c8938e4cf5777700700e642dc2a8cd-AuthorFeedback.pdf

Neural Information Processing SystemsOct-9-2025, 13:50:03 GMT

artificial intelligence, formulation, machine learning, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.32)

Add feedback

cd755a6c6b699f3262bcc2aa46ab507e-Supplemental.pdf

Neural Information Processing SystemsAug-17-2025, 10:40:41 GMT

artificial intelligence, machine learning, probability, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Data Science (0.67)

Add feedback

cd755a6c6b699f3262bcc2aa46ab507e-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 10:40:37 GMT

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

Neural Information Processing SystemsMay-27-2025, 09:41:36 GMT

In many real-world decision problems there is partially observed, hidden or latent information that remains fixed throughout an interaction. Such decision problems can be modeled as Latent Markov Decision Processes (LMDPs), where a latent variable is selected at the beginning of an interaction and is not disclosed to the agent initially. In last decade, there has been significant progress in designing learning algorithms for solving LMDPs under different structural assumptions. However, for general LMDPs, there is no known learning algorithm that provably matches the existing lower bound. We effectively resolve this open question, introducing the first sample-efficient algorithm for LMDPs without any additional structural assumptions.

algorithm, off-policy evaluation, online guarantee, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Neural Information Processing SystemsJan-19-2025, 06:01:06 GMT

In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP). In an LMDP, an MDP is randomly drawn from a set of M possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. We first show that a general instance of LMDPs requires at least \Omega((SA) M) episodes to even approximate the optimal policy. Then, we consider sufficient assumptions under which learning good policies requires polynomial number of episodes. We show that the key link is a notion of separation between the MDP system dynamics.

latent mdp, lower bound, regret guarantee, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Add feedback

Filters

Collaborating Authors

lmdp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

96bbdd0ed2a9e7cd2fb7caf2fae15f3d-Paper-Conference.pdf

LowerBound

RL for Latent MDPs: Regret Guarantees and a Lower Bound

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

42c8938e4cf5777700700e642dc2a8cd-AuthorFeedback.pdf

cd755a6c6b699f3262bcc2aa46ab507e-Supplemental.pdf

cd755a6c6b699f3262bcc2aa46ab507e-Paper.pdf

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

RL for Latent MDPs: Regret Guarantees and a Lower Bound