AITopics | path-sgd

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, rnn, (18 more...)

Country: North America (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Behnam Neyshabur, Russ R. Salakhutdinov, Nati Srebro

Neural Information Processing SystemsOct-2-2025, 15:37:03 GMT

We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and Ada-Grad.

artificial intelligence, machine learning, regularization, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Neural Information Processing SystemsAug-12-2025, 23:43:54 GMT

We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and AdaGrad.

name change, path-normalized optimization, path-sgd, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-8-2025, 09:04:52 GMT

Deep rectified neural networks are over-parameterized in the sense that scaling of the weights in one layer, can be compensated for exactly in the subsequent layer. This paper introduces Path-SGD, a simple modification to the SGD update rule, whose update is invariant to such rescaling. The method is derived from the proximal form of gradient descent, whereby a constraint term is added which preserves the norm of the "product weight" formed along each path in the network (from input to output node). Path-SGD is thus principled and shown to yield faster convergence for a standard 2 layer rectifier network, across a variety of dataset (MNIST, CIFAR-10, CIFAR-100, SVHN). As an algorithm, Path-SGD appears effective, simple to implement and addresses an obvious flaw in first-order updates to ReLU networks.

artificial intelligence, machine learning, path-sgd, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.74)

Add feedback

Reviews: Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Neural Information Processing SystemsJan-20-2025, 13:24:53 GMT

This seems to be a worthwhile goal (since plain RNNs are computationally cheaper and easier to analyze theoretically) and their experiments show some promising results in improving performance over plain RNNs trained with existing optimization methods. However, it is not clear to me how the method that the authors use in practice differs significantly from regular Path-SGD introduced in previous work. The authors do present an adaptation of Path-SGD to networks with shared weights, and show that the new rescaling term applied to the gradients can be divided into two terms k1 and k2. But then, they note that the second term, which accounts for interactions between shared weights along the same path, is expensive to calculate for RNNs and show some empirical evidence that including it does not help performance. In the rest of the experiments, they ignore the second term, which to my understanding is essentially what makes the method introduced here different from regular Path-SGD.

experiment, path-normalized optimization, second term, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Neural Information Processing SystemsMar-13-2024, 04:45:02 GMT

We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and Ada-Grad.

descent, path-sgd, regularization, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Path Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Neural Information Processing SystemsMar-12-2024, 12:46:03 GMT

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.

neural network, path-sgd, rnn, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Neyshabur, Behnam, Salakhutdinov, Russ R., Srebro, Nati

Neural Information Processing SystemsFeb-14-2020, 11:42:42 GMT

We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and AdaGrad. Papers published at the Neural Information Processing Systems Conference.

deep neural network, path-normalized optimization, path-sgd

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Neyshabur, Behnam, Wu, Yuhuai, Salakhutdinov, Ruslan R., Srebro, Nati

Neural Information Processing SystemsDec-31-2016

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.

artificial intelligence, machine learning, rnn, (18 more...)

Neural Information Processing Systems

Country: North America (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Neyshabur, Behnam, Salakhutdinov, Ruslan R., Srebro, Nati

Neural Information Processing SystemsDec-31-2015

We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and AdaGrad.

artificial intelligence, machine learning, regularization, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

path-sgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Reviews: Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Path Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Path-SGD: Path-Normalized Optimization in Deep Neural Networks