Near-OptimalRandomizedExplorationforTabular MarkovDecisionProcesses
–Neural Information Processing Systems
These algorithms inject (carefully tuned) random noise to value function to encourage exploration. UCB-type algorithms enjoy well-established theoretical guarantees but suffer from difficult implementation since an upper confidence bound isusually infeasible for manypractical models like neural networks. Instead, practitioners prefer randomized exploration such as noisy networks in [19], and algorithms with randomized exploration have been widely used in practice [37,13,11,35].
Neural Information Processing Systems
Feb-8-2026, 01:07:08 GMT
- Country:
- Europe
- Romania > Sud-Est Development Region
- Constanța County > Constanța (0.04)
- United Kingdom > England (0.04)
- Romania > Sud-Est Development Region
- North America > United States
- California (0.04)
- Europe
- Technology: