Response to Reviewer

Neural Information Processing Systems 

C2: Rademacher complexity in this paper refers to its empirical version and we will clarify this in the future version. We use policy evaluation errors to evaluate the quality of model learning. The results in Section 6.2 indicate that The shading on plots refers to the standard deviation over 3 random seeds. We will clarify this in the future version. The word "generation" means that the policy could perform For your choice of "No" to the reproducibility evaluation, we would like to point out that the proof, source We thank you for your insightful suggestion.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found