AITopics | splitting steepest descent

Splitting Steepest Descent for Growing Neural Architectures

Neural Information Processing SystemsDec-25-2025, 06:12:15 GMT

We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs. By leveraging a functional steepest descent idea, we derive a simple criterion for deciding the best subset of neurons to split and a \emph{splitting gradient} for optimally updating the off-springs. Theoretically, our splitting strategy is a second order functional steepest descent for escaping saddle points in an $\Linfty$-Wasserstein metric space, on which the standard parametric gradient descent is a first-order steepest descent. Our method provides a new computationally efficient approach for optimizing neural network structures, especially for learning lightweight neural architectures in resource-constrained settings.

name change, neural architecture, splitting steepest descent, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)

Add feedback

fdbe012e2e11314b96402b32c0df26b7-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 10:41:08 GMT

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.15)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reviews: Splitting Steepest Descent for Growing Neural Architectures

Neural Information Processing SystemsJan-22-2025, 23:45:33 GMT

The paper introduces an algorithm for splitting neurons when training neural networks using gradient descent, in order to progressively grow a neural network architecture for a better fit to the data. Some theoretical connections to wasserstein-infinity gradient steps are presented, as well as multiple experiments for learning interpretable or lightweight neural networks. The proposed method is original and interesting, and the experiments suggest that it is promising for various applications. However, the current proposed method seems far from practical in large scale settings, and the presented theoretical results are quite limited: * computationally, the algorithm as presented in Algorithm 1 requires (i) to check if a local optima is reached on the full training loss (and it seems that gradient updates are performed on the full training loss, which is costly), (ii) eigenvector computation on splitting matrices, (iii) dealing with dynamically growing weight matrices. These aspects seem to make the algorithm impractical compared to simply running SGD on an over-parameterized network.

algorithm, neural architecture, splitting steepest descent, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Reviews: Splitting Steepest Descent for Growing Neural Architectures

Neural Information Processing SystemsJan-22-2025, 23:45:22 GMT

The paper proposes an interesting technique to train neural networks while learning as well part of the architecture by splitting neurons in a principled way. More precisely, the algorithm is defining a notion of steepest descent in the space of distributions over neuron weights, which is a steepest descent in the space of these distributions equipped with the L-infty Wasserstein distance. The corresponding steepest descent algorithm retrieves the usual steepest descent direction as long as a "usual" descent direction exists, but if a local minimum is reached and some further progress could be made locally by duplicating (or make a number of other copies of) neurons and decoupling them, then the algorithm finds a locally optimal split. The paper is well written, novel, with clear simple theory. The idea is simple and elegant.

algorithm, neural architecture, splitting steepest descent, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Splitting Steepest Descent for Growing Neural Architectures

Neural Information Processing SystemsOct-9-2024, 20:33:33 GMT

We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs. By leveraging a functional steepest descent idea, we derive a simple criterion for deciding the best subset of neurons to split and a \emph{splitting gradient} for optimally updating the off-springs. Theoretically, our splitting strategy is a second order functional steepest descent for escaping saddle points in an \Linfty -Wasserstein metric space, on which the standard parametric gradient descent is a first-order steepest descent. Our method provides a new computationally efficient approach for optimizing neural network structures, especially for learning lightweight neural architectures in resource-constrained settings.

network structure, neural architecture, splitting steepest descent, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting

Wu, Lemeng, Ye, Mao, Lei, Qi, Lee, Jason D., Liu, Qiang

arXiv.org Machine LearningMar-23-2020

We propose signed splitting steepest descent (S3D), which progressively grows neural architectures by splitting critical neurons into multiple copies, following a theoretically-derived optimal scheme. Our algorithm is a generalization of the splitting steepest descent (S2D) of Liu et al. (2019b), but significantly improves over it by incorporating a rich set of new splitting schemes that allow negative output weights. By doing so, we can escape local optima that the original S2D can not escape. Theoretically, we show that our method provably learns neural networks with much smaller sizes than these needed for standard gradient descent in overparameterized regimes. Empirically, our method outperforms S2D and prior arts on various challenging benchmarks, including CIFAR-100, ImageNet and ModelNet40.

descent, neuron, splitting, (11 more...)

arXiv.org Machine Learning

2003.10392

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Splitting Steepest Descent for Growing Neural Architectures

Wu, Lemeng, Wang, Dilin, Liu, Qiang

Neural Information Processing SystemsMar-19-2020, 01:01:30 GMT

We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs. By leveraging a functional steepest descent idea, we derive a simple criterion for deciding the best subset of neurons to split and a \emph{splitting gradient} for optimally updating the off-springs. Theoretically, our splitting strategy is a second order functional steepest descent for escaping saddle points in an $\Linfty$-Wasserstein metric space, on which the standard parametric gradient descent is a first-order steepest descent. Our method provides a new computationally efficient approach for optimizing neural network structures, especially for learning lightweight neural architectures in resource-constrained settings. Papers published at the Neural Information Processing Systems Conference.

network structure, neural architecture, splitting steepest descent, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback