Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model
Andrea Zanette, Mykel J. Kochenderfer, Emma Brunskill
–Neural Information Processing Systems
Inparticular,well knownbounds foronline learningscale as a function of the gap between the expected reward of a particular action and the optimalaction [ABF02] and also on the variance ofthe rewards [AMS09].
Neural Information Processing Systems
Feb-13-2026, 10:57:54 GMT
- Country:
- Technology: