Reviews: Worst-Case Regret Bounds for Exploration via Randomized Value Functions

Jan-23-2025, 07:48:17 GMT–Neural Information Processing Systems

Post author response: I thank the author(s) for their response and commenting on my discussion points. As those would need additional work, I for now keep my original score: this is a solid paper. While the proof for Lemma 4 & 5 is described very well in the main text, it would be helpful to have a short explanation how this is used to achieve Lemma 6. If necessary, I suggest to drop the proof of Lemma 3 from the main text as this result is standard. Quality: I have verified the proof in the main text and individual lemmas in the appendix.

contribution, randomized value function, worst-case regret bound, (6 more...)

Neural Information Processing Systems

Jan-23-2025, 07:48:17 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.44)