Near-OptimalRandomizedExplorationforTabular MarkovDecisionProcesses

Neural Information Processing Systems 

These algorithms inject (carefully tuned) random noise to value function to encourage exploration. UCB-type algorithms enjoy well-established theoretical guarantees but suffer from difficult implementation since an upper confidence bound isusually infeasible for manypractical models like neural networks. Instead, practitioners prefer randomized exploration such as noisy networks in [19], and algorithms with randomized exploration have been widely used in practice [37,13,11,35].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found