Goto

Collaborating Authors

 clarify



926ffc0ca56636b9e73c565cf994ea5a-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their valuable comments. We are glad that reviewers noted our paper as novel (R1: "idea is "Decouple the effect of capacity increase and curriculum learning": We would like to We will also move related works section as suggested. We agree that this issue is important in the field of curriculum learning. "It could be interesting to show results on the large W ebVision Benchmark": "W ould proposed curriculum change robustness to adversarial attacks": On average, our method requires 20 % fewer epochs. ImageNet, we conducted new experiments on WebVision dataset (2.3 million training images) and obtain significant Please see the first table above.



704cddc91e28d1a5517518b2f12bc321-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their feedback. We will first respond to shared and then to individual comments. Additionally, reviewers 2 and 3 requested clarification regarding the advantages of DCA over other methods. For instance, one could attempt to correlate each neuron's contribution to the DCA subspace with single-neuron Studying the behavior of Kernel DCA is a direction for future studies. Additionally, we found and corrected a minor bug in Figure 1A: the SFA and DCA lines are now blue and red, respectively.


Reviewer 1: Unclear about the evaluation for outer iterations; Does the number of aggregated tasks affect

Neural Information Processing Systems

Y es, the total complexity is proportional to the number of aggregated tasks. Add experiments to compare ANIL and MAML and w.r .t. the size B of samples: Why sample size in inner-loop is not taken into analysis, as Fallah et al. [4] does: This setting has also been considered in Rajeswaran et al. [24], Ji et al. [13]. Reviewer 2: Dependence on κ. iMAML depends on κ in contrast to poly (κ) of this work: Add an experiment to verify the tightness: Great point! W e will definitely add such an experiment in the revision. W e will clarify it in the revision.


We thank the reviewers for their appreciative and thoughtful feedback

Neural Information Processing Systems

We thank the reviewers for their appreciative and thoughtful feedback. Reviewer 1. "However, the authors fail to bring the result to their impact of the current state OT, or any novel stochastic optimization algorithm designed to compute it faster. We will further emphasize these aspects. " A measure with 0 mean. Reviewer 2. "If the paper could show the formula for that case [TV] that would be Figure 1: Large dimensions need more samples to approximate the moments of the unbalanced optimal transport plan. Reviewer 3. ""Figure 1 illustrates the convergence"... the convergence of what?" "Figure 2 is also difficult to understand.



error is simply the

Neural Information Processing Systems

Figure (b) above shows that the performance is robust to different GCN embedding sizes. EA... degree to help": Figure (a) shows ablation study on NAS-Bench-201, which varies each component (surrogate The other experimental settings are the same as in Section 4.2. As can be seen, more accurate architectures are close to each other. BO typically works better in low-dimensional...": We use Here, in Figure (d) above, we use subnets that are sampled in the same search iteration. F or example, it is common to see pooling": Y es, we Thus, GCN propagation part is more important than how to add global node.


0b8aff0438617c055eb55f0ba5d226fa-AuthorFeedback.pdf

Neural Information Processing Systems

Why it makes sense to deblur the extracted features? Violations can be successfully compensated by the feature refinement. We will discuss these in detail and add corresponding results in the revised paper. The reason why the improvement in Tab. 3 is not so Our PSNR results are 25.57 Tabs. 3 and 5 are evaluated on [19] and [16], respectively, with The contributions are summarized in L50-61.


Reviewer 1 - Use of mini-batches: in our experiments, we indeed use mini-batches of size B, by sampling B points

Neural Information Processing Systems

We would like to thank all reviewers for their valuable feedback and comments. Please find our responses below. This is because it predicts an almost uniform distribution. AdaCV aR also has a lower CV aR than ERM (standard SGD). Thank you for observing that.