Export Reviews, Discussions, Author Feedback and Meta-Reviews

Feb-8-2025, 09:04:52 GMT–Neural Information Processing Systems

Deep rectified neural networks are over-parameterized in the sense that scaling of the weights in one layer, can be compensated for exactly in the subsequent layer. This paper introduces Path-SGD, a simple modification to the SGD update rule, whose update is invariant to such rescaling. The method is derived from the proximal form of gradient descent, whereby a constraint term is added which preserves the norm of the "product weight" formed along each path in the network (from input to output node). Path-SGD is thus principled and shown to yield faster convergence for a standard 2 layer rectifier network, across a variety of dataset (MNIST, CIFAR-10, CIFAR-100, SVHN). As an algorithm, Path-SGD appears effective, simple to implement and addresses an obvious flaw in first-order updates to ReLU networks.

artificial intelligence, machine learning, path-sgd, (15 more...)

Neural Information Processing Systems

Feb-8-2025, 09:04:52 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.74)