Scalable Optimization in the Modular Norm

May-30-2025, 20:48:38 GMT–Neural Information Processing Systems

To improve performance in contemporary deep learning, one is interested in scaling up the neural network in terms of both the number and the size of the layers. When ramping up the width of a single layer, graceful scaling of training has been linked to the need to normalize the weights and their updates in the "natural norm" particular to that layer. In this paper, we significantly generalize this idea by defining the modular norm, which is the natural norm on the full weight space of any neural network architecture. The modular norm is defined recursively in tandem with the network architecture itself. We show that the modular norm has several promising applications.

artificial intelligence, machine learning, module, (20 more...)

Neural Information Processing Systems

May-30-2025, 20:48:38 GMT

Conferences PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)
- North America
  - Canada > Ontario
    - Toronto (0.14)
  - United States (0.45)

Genre:
- Research Report > Experimental Study (0.92)

Industry:
- Energy (0.67)
- Government (0.45)
- Information Technology (0.45)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
Scalable Optimization in the Modular Norm

Similar Docs Excel Report more

Title	Similarity	Source
None found