Export Reviews, Discussions, Author Feedback and Meta-Reviews

Oct-3-2025, 02:18:15 GMT–Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: This paper provides a Bayesian expected regret bound for the Posterior Sampling for the Reinforcement Learning (PSRL) algorithm. PSRL has been introduced by [Strens2000], and can be seen as the application of Thompson sampling for RL problems: a model is sampled from the (posterior) distribution over models, the optimal policy for the sampled model is calculated, the policy is followed until the end of the horizon, and the distribution over models is updated. PSRL for finite MDPs has been analyzed by [OVRR2013], but the main contribution of this paper is to analyze PSRL for MDPs with general state and action space. In the analysis, the authors use the concept of eluder dimension introduced by [RVR2013]. Eluder dimension was previously used in the analysis of bandit problems (for both Thompson Sampling and the Optimism in Face of Uncertainty (OFU) approaches).

algorithm, dimension, eluder dimension, (11 more...)

Neural Information Processing Systems

Oct-3-2025, 02:18:15 GMT

Conferences Web Page

Add feedback

Country:
- North America > Canada > Quebec > Montreal (0.04)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.49)
  - Artificial Intelligence > Machine Learning
    - Reinforcement Learning (0.52)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.34)