AITopics | pomdp

Reference-Based POMDPs

Neural Information Processing SystemsApr-28-2026, 19:29:36 GMT

Making good decisions in partially observable and non-deterministic scenarios is a crucial capability for robots. APartially Observable Markov Decision Process (POMDP) is a general framework for the above problem. Despite advances in POMDP solving, problems with long planning horizons and evolving environments remain difficult to solve even by the best approximate solvers today. To alleviate this difficulty, we propose a slightly modified POMDP problem, called a ReferenceBased POMDP, where the objective is to balance between maximizing the expected total reward and being close to a given reference (stochastic) policy. The optimal policy of a Reference-Based POMDP can be computed via iterative expectations using the given reference policy, thereby avoiding exhaustive enumeration of actions at each belief node of the search tree. We demonstrate theoretically that the standard POMDP under stochastic policies is related to the Reference-Based POMDP. To demonstrate the feasibility of exploiting the formulation, we present a basic algorithm REFSOLVER. Results from experiments on long-horizon navigation problems indicate that this basic algorithm substantially outperforms POMCP.

artificial intelligence, machine learning, pomdp, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

240ac9371ec2671ae99847c3ae2e6384-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 03:29:10 GMT

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Data Science > Data Mining (0.96)

Add feedback

218344619d8fb95d504ccfa11804073f-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 02:07:54 GMT

agent, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Industry: Transportation (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning in Observable POMDPs, without Computationally Intractable Oracles

Neural Information Processing SystemsApr-24-2026, 11:30:06 GMT

Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement. Specifically for learning near-optimal policies in Partially Observable Markov Decision Processes (POMDPs), existing algorithms either need to make strong assumptions about the model dynamics (e.g.

artificial intelligence, machine learning, pomdp, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre: Workflow (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Learning in Observable POMDPs, without Computationally Intractable Oracles

Neural Information Processing SystemsApr-24-2026, 11:30:03 GMT

Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement. Specifically for learning near-optimal policies in Partially Observable Markov Decision Processes (POMDPs), existing algorithms either need to make strong assumptions about the model dynamics (e.g.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models

Neural Information Processing SystemsApr-24-2026, 08:35:47 GMT

We study the problem of off-policy evaluation (OPE) for episodic Partially Observable Markov Decision Processes (POMDPs) with continuous states. Motivated by the recently proposed proximal causal inference framework, we develop a non-parametric identification result for estimating the policy value via a sequence of so-called V-bridge functions with the help of time-dependent proxy variables. We then develop a fitted-Q-evaluation-type algorithm to estimate V-bridge functions recursively, where a non-parametric instrumental variable (NPIV) problem is solved at each step. By analyzing this challenging sequential NPIV problem, we establish the finite-sample error bounds for estimating the V-bridge functions and accordingly that for evaluating the policy value, in terms of the sample size, length of horizon and so-called (local) measure of ill-posedness at each step. To the best of our knowledge, this is the first finite-sample error bound for OPE in POMDPs under non-parametric models.

artificial intelligence, estimation, machine learning, (15 more...)

Neural Information Processing Systems

Genre: Workflow (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

03d7e13f0092405804f3a381ade8f3f0-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 08:35:34 GMT

artificial intelligence, machine learning, pomdp, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.31)

Add feedback

03d7e13f0092405804f3a381ade8f3f0-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 08:35:30 GMT

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)

Add feedback

On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation

Neural Information Processing SystemsMar-22-2026, 17:34:57 GMT

We study off-policy evaluation (OPE) in partially observable environments with complex observations, with the goal of developing estimators whose guarantee avoids exponential dependence on the horizon. While such estimators exist for MDPs and POMDPs can be converted to history-based MDPs, their estimation errors depend on the state-density ratio for MDPs which becomes history ratios after conversion, an exponential object. Recently, Uehara et al. [2022a] proposed future-dependent value functions as a promising framework to address this issue, where the guarantee for memoryless policies depends on the density ratio over the latent state space. However, it also depends on the boundedness of the future-dependent value function and other related quantities, which we show could be exponential-in-length and thus erasing the advantage of the method. In this paper, we discover novel coverage assumptions tailored to the structure of POMDPs, such as outcome coverage and belief coverage, which enable polynomial bounds on the aforementioned quantities. As a side product, our analyses also lead to the discovery of new algorithms with complementary properties.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

Periodic agent-state based Q-learning for POMDPs

Neural Information Processing SystemsMar-21-2026, 03:12:45 GMT

The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP. However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Filters

Collaborating Authors

pomdp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Reference-Based POMDPs

240ac9371ec2671ae99847c3ae2e6384-Paper.pdf

218344619d8fb95d504ccfa11804073f-Supplemental.pdf

Learning in Observable POMDPs, without Computationally Intractable Oracles

Learning in Observable POMDPs, without Computationally Intractable Oracles

Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models

03d7e13f0092405804f3a381ade8f3f0-Supplemental-Conference.pdf

03d7e13f0092405804f3a381ade8f3f0-Paper-Conference.pdf

On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation

Periodic agent-state based Q-learning for POMDPs