Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model

Andrea Zanette, Mykel J. Kochenderfer, Emma Brunskill

Neural Information Processing Systems 

Inparticular,well knownbounds foronline learningscale as a function of the gap between the expected reward of a particular action and the optimalaction [ABF02] and also on the variance ofthe rewards [AMS09].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found