The goal of our work was easy reproducibility and clearly showing the benefits of learning to explore over the state
–Neural Information Processing Systems
Thank you for the reviews of our paper. We will revise the paper accordingly. We discuss a contextual extension in Section 8. The policies for longer horizons also perform well and outperform TS. This can be seen in the proof in Appendix C, which only requires that γ = 1 /θ [1 /8, 1).
Neural Information Processing Systems
Oct-9-2025, 13:22:34 GMT
- Technology: