1. (Modified Algorithm of Definition 5.1) The update rules of Definition 5.1 (leading to Algorithm 2 in the suppl

Neural Information Processing Systems 

We thank the reviewers for their insightful comments and suggestions. We hope that the rebuttal will clarify the issues. There are two reasons why Algorithm 1 should be preferred in practice. RL algorithms (e.g., MBIE or Delayed-QL), Algorithm 2 is extremely conservative, leading to very slow convergence. Algorithm 1 can be seen as a "practical" version of Algorithm 2 We will clarify this point in the final version.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found