Goto

Collaborating Authors

 work section



typos, add the discussion about the parameters, and broaden the related works section) in the next version of the paper. 2 We now respond to the major comments are as follows

Neural Information Processing Systems

We now respond to the major comments are as follows. Take RL with the linear model as an example. More formally, we believe that the key is to prove an analogue of Lemma 5 for the linear model. We will also discuss the work on policy certificates (Dann et al., 2019) in related work section. We will add this discussion ot the next version of the paper.




Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

Q2: active manifold M is dependent on x [...] A: The manifold M in Theorem 3.1 is the manifold associated to x^*, to clarify this, we will denote it as M_{x^*}. Q3: why applying the Baillon-Haddad theorem [...] A: The subdifferential partial Phi is a set-valued operator and *cannot* be non-expansive in the classical sense. It is indeed non-expansiveness of G_k that we need, and this is a consequence of Baillon-Haddad theorem which states that \nabla F is firmly non-expansive, and thus that G_k is \alpha_k-averaged, hence non-expansive, for the prescribed range of \gamma_k.


incorporate all other feedback in the reviews into the paper's final version

Neural Information Processing Systems

Thank you for your constructive feedback. We address the reviewers' main points below; however, we will also The motivation is unclear (R3, R4), in particular, as differentiable programming is not new (R4). The problem of searching in spaces of program architectures is also well-studied in classical program synthesis. Our paper's new observation, as noted by Prior efforts in machine learning research have used these datasets -- see "Generating Multi-Agent We will add further clarifying details about the DSL in the final version.



self-1 distillation (SD) and label-smoothing (LS) as MAP insightful ([R2], [R3], [R4]), that relating accuracy to confidence

Neural Information Processing Systems

We thank all reviewers for their constructive feedback! We address reviewers comments below, and will incorporate all feedback. This explains why SD outperforms LS. Please refer to our response to [R3] for discussion on CD. One can alternatively compute the variance of prediction confidence.