Reviews: Splitting Steepest Descent for Growing Neural Architectures
–Neural Information Processing Systems
The paper introduces an algorithm for splitting neurons when training neural networks using gradient descent, in order to progressively grow a neural network architecture for a better fit to the data. Some theoretical connections to wasserstein-infinity gradient steps are presented, as well as multiple experiments for learning interpretable or lightweight neural networks. The proposed method is original and interesting, and the experiments suggest that it is promising for various applications. However, the current proposed method seems far from practical in large scale settings, and the presented theoretical results are quite limited: * computationally, the algorithm as presented in Algorithm 1 requires (i) to check if a local optima is reached on the full training loss (and it seems that gradient updates are performed on the full training loss, which is costly), (ii) eigenvector computation on splitting matrices, (iii) dealing with dynamically growing weight matrices. These aspects seem to make the algorithm impractical compared to simply running SGD on an over-parameterized network.
Neural Information Processing Systems
Jan-22-2025, 23:45:33 GMT
- Technology: