Goto

Collaborating Authors

 Reinforcement Learning







Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model

Neural Information Processing Systems

Inparticular,well knownbounds foronline learningscale as a function of the gap between the expected reward of a particular action and the optimalaction [ABF02] and also on the variance ofthe rewards [AMS09].