Reviews: Exploration in Structured Reinforcement Learning

Oct-8-2024, 05:42:09 GMT–Neural Information Processing Systems

It provides problem-related (asymptotic) lower and upper bounds on the regret, the latter for an algorithm presented in the paper that builds on Burnetas and Katehakis (1997) and a recent bandit paper by Combes et al (NIPS 2017). The setting assumes that an "MDP structure" \Phi (i.e. a set of possible MDP models) is given. The regret bounds (after T steps) are shown to be of the form K_Phi*log T, where the parameter K_\Phi is the solution to a particular optimization problem. It is shown that if \Phi is the set of all MDPs ("the unstructured case") then K_\Phi is bounded by HSA/\delta, where H is the bias span and \delta the minimal action sub-optimality gap. The second particular class that is considered is the Lipschitz structure that considers embeddings of finite MDPs in Euclidian space such that transition probabilities and rewards are Lipschitz. In this case, the regret bounds are shown to not to depend on the size of state and action space anymore.

algorithm, exploration, structured reinforcement learning, (9 more...)

Neural Information Processing Systems

Oct-8-2024, 05:42:09 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.42)
  - Representation & Reasoning > Optimization (0.39)