1. (Modified Algorithm of Definition 5.1) The update rules of Definition 5.1 (leading to Algorithm 2 in the suppl
–Neural Information Processing Systems
We thank the reviewers for their insightful comments and suggestions. We hope that the rebuttal will clarify the issues. There are two reasons why Algorithm 1 should be preferred in practice. RL algorithms (e.g., MBIE or Delayed-QL), Algorithm 2 is extremely conservative, leading to very slow convergence. Algorithm 1 can be seen as a "practical" version of Algorithm 2 We will clarify this point in the final version.
Neural Information Processing Systems
Aug-20-2025, 09:55:59 GMT
- Technology: