The high-level intuition behind MDP-GapE is that unlike a purely optimistic policy, 2

Neural Information Processing Systems 

UCRL, and obtain the same guarantees (our current analysis uses Pinsker's inequality and does not fully exploit the KL

Similar Docs  Excel Report  more

TitleSimilaritySource
None found