Scalable Optimization in the Modular Norm
–Neural Information Processing Systems
To improve performance in contemporary deep learning, one is interested in scaling up the neural network in terms of both the number and the size of the layers. When ramping up the width of a single layer, graceful scaling of training has been linked to the need to normalize the weights and their updates in the "natural norm" particular to that layer. In this paper, we significantly generalize this idea by defining the modular norm, which is the natural norm on the full weight space of any neural network architecture. The modular norm is defined recursively in tandem with the network architecture itself. We show that the modular norm has several promising applications.
Neural Information Processing Systems
May-30-2025, 20:48:38 GMT
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- North America
- Canada > Ontario
- Toronto (0.14)
- United States (0.45)
- Canada > Ontario
- Europe > United Kingdom
- Genre:
- Research Report > Experimental Study (0.92)
- Industry:
- Energy (0.67)
- Government (0.45)
- Information Technology (0.45)
- Technology: