Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions

May-13-2023–arXiv.org Artificial Intelligence

We study parameterized MDPs (PMDPs) in which the key parameters of interest are unknown and must be learned using Bayesian inference. One key defining feature of such models is the presence of "uninformative" actions that provide no information about the unknown parameters. We contribute a set of assumptions for PMDPs under which Thompson sampling guarantees an asymptotically optimal expected regret bound of $O(T^{-1})$, which are easily verified for many classes of problems such as queuing, inventory control, and dynamic pricing.

artificial intelligence, machine learning, thompson, (18 more...)

arXiv.org Artificial Intelligence

May-13-2023

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Ontario (0.29)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (0.66)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found