Review for NeurIPS paper: Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks

Feb-8-2025, 16:46:27 GMT–Neural Information Processing Systems

Clarity: 1. clarify if \varepsilon and \delta are learnable parameters the same as model parameters or they are just learnable during the architecture descent. In Line 116, when optimizing \varepsilon and \delta, are neural network weights also updated? 2. "measured by the gradient magnitude", magnitude of full-batch or a few mini-batches? A small number? 5. Make use the legend labels "Random (split)" and "RandSearFh" in Figure 1(a) are exactly the same with those appeared in the text ("RandSearch (split)" and "RandSearch (split new)"). In Figure 1(a), a should-have simple baseline: add one neuron and randomly initialize new weights. In Figure 3(b), If the splitting and growing happen at the same time, the number of neurons (markers along x-axis) should have a gap larger than 1.

firefly neural architecture descent, general approach, neural network, (6 more...)

Neural Information Processing Systems

Feb-8-2025, 16:46:27 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)