Reviewer # 1 Thanks for the comments!

Neural Information Processing Systems 

We should clarify that the theoretical results already consider out-of-sample generalization. There are connections between this work and entropy regularized RL, but there are also distinctions. This allows us to prove new generalization bounds in the form of Theorem 8. We are also able to We also have the same suite of results prepared for MNIST, and standard deviations for Table 1. "It is unclear to me if the reward estimation algorithm is actually evaluated in the experiments." Y es, Section 3.6 used "Can you comment on the increased variance demonstrated by Composite on T able 2?" To produce Table 2, "I find curious that [...] all the experiments consists of classification tasks "reworked" [...]." Criteo dataset is a benchmark in this area, which has been extracted from a real online advertising challenge.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found