AITopics | groupnorm

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Neural Information Processing SystemsApr-25-2026, 04:01:45 GMT

Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative normalization layers, these properties need to be generalized so that any given layer's success/failure can be accurately predicted. In this work, we take a first step towards this goal by extending known properties of BatchNorm in randomly initialized deep neural networks (DNNs) to several recently proposed normalization layers. Our primary findings follow: (i) similar to BatchNorm, activations-based normalization layers can prevent exponential growth of activations in ResNets, but parametric techniques require explicit remedies; (ii) use of GroupNorm can ensure an informative forward propagation, with different samples being assigned dissimilar activations, but increasing group size results in increasingly indistinguishable activations for different samples, explaining slow convergence speed in models with LayerNorm; and (iii) small group sizes result in large gradient norm in earlier layers, hence explaining training instability issues in Instance Normalization and illustrating a speed-stability tradeoff in GroupNorm. Overall, our analysis reveals a unified set of mechanisms that underpin the success of normalization methods in deep learning, providing us with a compass to systematically explore the vast design space of DNN normalization layers.

artificial intelligence, batchnorm, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Neural Information Processing SystemsApr-25-2026, 04:01:41 GMT

Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative normalization layers, these properties need to be generalized so that any given layer's success/failure can be accurately predicted. In this work, we take a first step towards this goal by extending known properties of BatchNorm in randomly initialized deep neural networks (DNNs) to several recently proposed normalization layers. Our primary findings follow: (i) similar to BatchNorm, activations-based normalization layers can prevent exponential growth of activations in ResNets, but parametric techniques require explicit remedies; (ii) use of GroupNorm can ensure an informative forward propagation, with different samples being assigned dissimilar activations, but increasing group size results in increasingly indistinguishable activations for different samples, explaining slow convergence speed in models with LayerNorm; and (iii) small group sizes result in large gradient norm in earlier layers, hence explaining training instability issues in Instance Normalization and illustrating a speed-stability tradeoff in GroupNorm. Overall, our analysis reveals a unified set of mechanisms that underpin the success of normalization methods in deep learning, providing us with a compass to systematically explore the vast design space of DNN normalization layers.

artificial intelligence, batchnorm, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

!011Im2Col0 1

Neural Information Processing SystemsApr-25-2026, 01:01:57 GMT

We adopt a residual network (ResNet) [23] based feature extractor, with ELU as the activation function. Following [15], we adopt group normalization and instance normalization for better stability of the networks. We adopt the "leave-one-out" training strategy for obtaining the results on each of the categories of MVTec-AD. All experiments are performed with the same settings and hyperparameters. We resize all images to 128 128, and do not perform any data augmentation.

artificial intelligence, groupnorm, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Add feedback

Ignorance is Bliss: Robust Control via Information Gating Manan Tomar

Neural Information Processing SystemsFeb-15-2026, 03:49:03 GMT

We propose information gating as a way to learn parsimonious representations that identify the minimal information required for a task.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.67)
North America > Canada > Alberta (0.14)

Technology:

Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

on ResNet-50 and by 7.3% on MobileNetV2

Neural Information Processing SystemsFeb-9-2026, 13:15:40 GMT

Our gains are indeed large. EvoNorm-S0 is the state-of-the-art in the small batch size regime (Table 4), outperforming BN-ReLU by 7.8% We achieve clear gains over other influential works such as GroupNorm (GN). We'd also like to emphasize that EvoNorms beat BN-ReLU on 12 (out of 14) different classification models/training These are significant considering the predominance of BN-ReLU in ML models. R3: "the overall search algorithm lacks some novelty." "yet another AutoML paper" (with the expectation that some fancy search algorithms must be proposed), but rather under R2, R4: Can EvoNorms generalize to deeper variants (e.g., ResNet-101) and architecture families not included MnasNet, EfficientNet-B5, Mask R-CNN + FPN/SpineNet and BigGAN-none of them was used during search.

artificial intelligence, bn-relu, machine learning, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.60)

Add feedback

3812f9a59b634c2a9c574610eaba5bed-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 02:36:37 GMT

broyden, convergence, mdeq, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.35)

Add feedback

2578eb9cdf020730f77793e8b58e165a-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 22:14:55 GMT

Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning.

artificial intelligence, inproc, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

2578eb9cdf020730f77793e8b58e165a-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 22:14:52 GMT

Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning.

artificial intelligence, inproc, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

1fe6f635fe265292aba3987b5123ae3d-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 20:34:19 GMT

conv, groupnorm, stride, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Add feedback

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Neural Information Processing SystemsDec-23-2025, 21:51:32 GMT

Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative normalization layers, these properties need to be generalized so that any given layer's success/failure can be accurately predicted. In this work, we take a first step towards this goal by extending known properties of BatchNorm in randomly initialized deep neural networks (DNNs) to several recently proposed normalization layers. Our primary findings follow: (i) similar to BatchNorm, activations-based normalization layers can prevent exponential growth of activations in ResNets, but parametric techniques require explicit remedies; (ii) use of GroupNorm can ensure an informative forward propagation, with different samples being assigned dissimilar activations, but increasing group size results in increasingly indistinguishable activations for different samples, explaining slow convergence speed in models with LayerNorm; and (iii) small group sizes result in large gradient norm in earlier layers, hence explaining training instability issues in Instance Normalization and illustrating a speed-stability tradeoff in GroupNorm. Overall, our analysis reveals a unified set of mechanisms that underpin the success of normalization methods in deep learning, providing us with a compass to systematically explore the vast design space of DNN normalization layers.

batchnorm, name change, normalization layer, (8 more...)

Neural Information Processing Systems

Country: Europe > Latvia > Lubāna Municipality > Lubāna (0.07)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback