Regret Analysis of the Posterior Sampling-based Learning Algorithm for Episodic POMDPs

Tang, Dengwang, Jain, Rahul, Nayyar, Ashutosh, Nuzzo, Pierluigi

Oct-16-2023–arXiv.org Machine Learning

Compared to Markov Decision Processes (MDPs), learning in Partially Observable Markov Decision Processes (POMDPs) can be significantly harder due to the difficulty of interpreting observations. In this paper, we consider episodic learning problems in POMDPs with unknown transition and observation models. We consider the Posterior Sampling-based Reinforcement Learning (PSRL) algorithm for POMDPs and show that its Bayesian regret scales as the square root of the number of episodes. In general, the regret scales exponentially with the horizon length $H$, and we show that this is inevitable by providing a lower bound. However, under the condition that the POMDP is undercomplete and weakly revealing, we establish a polynomial Bayesian regret bound that improves the regret bound by a factor of $\Omega(H^2\sqrt{SA})$ over the recent result by arXiv:2204.08967.

artificial intelligence, machine learning, pomdp, (16 more...)

arXiv.org Machine Learning

Oct-16-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.14)

Genre:
- Research Report (0.82)

Industry:
- Education (0.49)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found