Review for NeurIPS paper: Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks
–Neural Information Processing Systems
Clarity: 1. clarify if \varepsilon and \delta are learnable parameters the same as model parameters or they are just learnable during the architecture descent. In Line 116, when optimizing \varepsilon and \delta, are neural network weights also updated? 2. "measured by the gradient magnitude", magnitude of full-batch or a few mini-batches? A small number? 5. Make use the legend labels "Random (split)" and "RandSearFh" in Figure 1(a) are exactly the same with those appeared in the text ("RandSearch (split)" and "RandSearch (split new)"). In Figure 1(a), a should-have simple baseline: add one neuron and randomly initialize new weights. In Figure 3(b), If the splitting and growing happen at the same time, the number of neurons (markers along x-axis) should have a gap larger than 1.
Neural Information Processing Systems
Feb-8-2025, 16:46:27 GMT
- Technology: