On Online Learning in Kernelized Markov Decision Processes

Nov-4-2019–arXiv.org Machine Learning

Abstract-- We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques. The algorithms are based on both the Upper Confidence Bound (UCB) as well as Posterior or Thompson Sampling (PSRL) philosophies, and work in the general setting of continuous state and action spaces when the true unknown transition dynamics are assumed to have smoothness induced by an appropriate Reproducing Kernel Hilbert Space (RKHS). I. INTRODUCTION The goal of reinforcement learning (RL) is to learn optimal behavior by repeated interaction with an unknown environment, usually modeled as a Markov Decision Process (MDP). Performance is typically measured by the amount of interaction, in terms of episodes or rounds, needed to arriv e at an optimal (or near-optimal) policy; this is also known as the sample complexity of RL [1]. The sample complexity objective encourages efficient exploration across states a nd actions, but, at the same time, is indifferent to the reward earned during the learning phase.

kernel, neural information processing system, probability, (13 more...)

arXiv.org Machine Learning

Nov-4-2019

arXiv.org PDF

Add feedback

Country:
- Europe > France
  - Île-de-France > Paris > Paris (0.04)
- Asia > India
  - Karnataka > Bengaluru (0.04)

Genre:
- Research Report (0.50)

Industry:
- Education > Educational Setting > Online (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found