Breaking the Order Barrier: Off-Policy Evaluation for Confounded POMDPs

Jun-12-2026, 00:31:28 GMT–Neural Information Processing Systems

We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs) with unobserved confounding. Recent advances have introduced bridge-function to circumvent unmeasured confounding and develop estimators for the policy value, yet the statistical error bounds of them related to the length of horizon $T$ and the size of the state-action space $|\mathcal{O}||\mathcal{A}|$ remain largely unexplored. In this paper, we systematically investigate the finite-sample error bounds of OPE estimators in finite-horizon tabular confounded POMDPs. Specifically, we show that under certain rank conditions, the estimation error for policy value can achieve a rate of $\mathcal{O}(T^{1.5}/\sqrt{n})$,

artificial intelligence, machine learning, mathcal, (8 more...)

Neural Information Processing Systems

Jun-12-2026, 00:31:28 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)