We would like to thank the reviewers for your thoughtful feedback and comments which would undoubtedly make the

Neural Information Processing Systems 

We will update our paper to reflect your comments, fix typos and include missing references. We will update the paper to make this more overt. Eq. 4 is therefore chosen Both Eq. 3 and 4 are motivated by the policy improvement theorem. Whereas Eq. 3 seeks to improve the policy by choosing a better action to copy, Eq. 4 does this in a soft manner. R2 - reproducibility: We have open-sourced the code for CRR on Github and the link will be made available.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found