32fcc8cfe1fa4c77b5c58dafd36d1a98-AuthorFeedback.pdf

Neural Information Processing Systems 

We thank the reviewers for their detailed comments. Please see our response below. "... common implementation of weight decay [1] will usually multiply the amount of weight decay by the learning " The same holds in our setup: We have an "How do different learning rate schedules affect the conclusion?": We address LR schedule questions below. "It would be great if the authors can provide more experiments on ... AUTOL2" We ran additional experiments "((1)) If I could have access to the test set... " . We reject the claim that our submission "violates the ethics of "((2)) I have concerns on comparing AutoL2... " . Experiments with lr decay and AutoL2 are presented in the SM. "((3))) The practically of the proposed work... "... more insights on the relation between learning rate scheduler and AutoL2... " We address this point in the "... the lambda update refractory period is not detailed ... " The refractory period lasts for "It would be interesting to see on the same graph, training with learning rate scheduler ... " In the SM we have the "In Figure 1a and 1b, how is the best test accuracy determined?... " In Figs.