Monte-Carlo utility estimates for Bayesian reinforcement learning

Dimitrakakis, Christos

arXiv.org Machine Learning 

Bayesian reinforcement learning [1], [2] is the decisiontheoretic approach [3] to solving the reinforcement learning problem. Unfonrtunately, calculating posterior distributions can be computationally expensive. Morever, the Bayesoptimal decision can be intractable [4], [5], [1], and even calculating an optimal solution in a restricted class can be difficult [6]. This paper proposes a set of algorithms that take actions by estimating bounds on the Bayes-optimal utility through sampling. They include a direct Monte-Carlo approach, as well as gradient-based approaches. We demonstrate the effectiveness of the proposed algorithms experimentally. A. Setting In the reinforcement learning problem, an agent is acting in some unknown Markovian environment µ M, according to some policy π Π. The agent's policy is a procedure for selecting actions, with the action at time t being a

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found