Reviewer # 1 Thanks for the comments!
–Neural Information Processing Systems
We should clarify that the theoretical results already consider out-of-sample generalization. There are connections between this work and entropy regularized RL, but there are also distinctions. This allows us to prove new generalization bounds in the form of Theorem 8. We are also able to We also have the same suite of results prepared for MNIST, and standard deviations for Table 1. "It is unclear to me if the reward estimation algorithm is actually evaluated in the experiments." Y es, Section 3.6 used "Can you comment on the increased variance demonstrated by Composite on T able 2?" To produce Table 2, "I find curious that [...] all the experiments consists of classification tasks "reworked" [...]." Criteo dataset is a benchmark in this area, which has been extracted from a real online advertising challenge.
Neural Information Processing Systems
Oct-3-2025, 03:19:13 GMT
- Technology: