Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound

Open in new window