AITopics | robust initialization

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Neural Information Processing SystemsDec-26-2025, 02:12:13 GMT

Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter initialization strategies have not been studied previously for weight normalized networks and, in practice, initialization methods designed for un-normalized networks are used as a proxy. Similarly, initialization for ResNets have also been studied for un-normalized networks and often under simplified settings ignoring the shortcut connection. To address these issues, we propose a novel parameter initialization strategy that avoids explosion/vanishment of information across layers for weight normalized networks with and without residual connections. The proposed strategy is based on a theoretical analysis using mean field approximation. We run over 2,500 experiments and evaluate our proposal on image datasets showing that the proposed initialization outperforms existing initialization methods in terms of generalization performance, robustness to hyper-parameter values and variance between seeds, especially when networks get deeper in which case existing methods fail to even start training. Finally, we show that using our initialization in conjunction with learning rate warmup is able to reduce the gap between the performance of weight normalized and batch normalized networks.

initialization, name change, robust initialization, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Reviews: How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Neural Information Processing SystemsJan-27-2025, 18:05:51 GMT

All reviewers are positive about the paper. The paper introduces a new initialization scheme for ResNets. The experimental results the authors present are extensive. The proposed initialization scheme appears to be quite effective empirically. The discussion of the results is particularly careful and nuanced.

initialize, robust initialization, weightnorm & resnet, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Neural Information Processing SystemsOct-11-2024, 03:48:21 GMT

Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter initialization strategies have not been studied previously for weight normalized networks and, in practice, initialization methods designed for un-normalized networks are used as a proxy. Similarly, initialization for ResNets have also been studied for un-normalized networks and often under simplified settings ignoring the shortcut connection. To address these issues, we propose a novel parameter initialization strategy that avoids explosion/vanishment of information across layers for weight normalized networks with and without residual connections. The proposed strategy is based on a theoretical analysis using mean field approximation. We run over 2,500 experiments and evaluate our proposal on image datasets showing that the proposed initialization outperforms existing initialization methods in terms of generalization performance, robustness to hyper-parameter values and variance between seeds, especially when networks get deeper in which case existing methods fail to even start training.

parameter initialization strategy, robust initialization, weightnorm & resnet, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)

Add feedback

A Robust Initialization of Residual Blocks for Effective ResNet Training without Batch Normalization

Civitelli, Enrico, Sortino, Alessio, Lapucci, Matteo, Bagattini, Francesco, Galvan, Giulio

arXiv.org Artificial IntelligenceOct-12-2022

Batch Normalization is an essential component of all state-of-the-art neural networks architectures. However, since it introduces many practical issues, much recent research has been devoted to designing normalization-free architectures. In this paper, we show that weights initialization is key to train ResNet-like normalization-free networks. In particular, we propose a slight modification to the summation operation of a block output to the skip-connection branch, so that the whole network is correctly initialized. We show that this modified architecture achieves competitive results on CIFAR-10, CIFAR-100 and ImageNet without further regularization nor algorithmic modifications.

artificial intelligence, effective resnet training, machine learning, (3 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TNNLS.2023.3325541

2112.12299

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Arpit, Devansh, Campos, Víctor, Bengio, Yoshua

Neural Information Processing SystemsMar-19-2020, 01:02:46 GMT

Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter initialization strategies have not been studied previously for weight normalized networks and, in practice, initialization methods designed for un-normalized networks are used as a proxy. Similarly, initialization for ResNets have also been studied for un-normalized networks and often under simplified settings ignoring the shortcut connection. To address these issues, we propose a novel parameter initialization strategy that avoids explosion/vanishment of information across layers for weight normalized networks with and without residual connections. The proposed strategy is based on a theoretical analysis using mean field approximation.

parameter initialization strategy, robust initialization, weightnorm & resnet, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)

Add feedback

Efficient Batch Black-box Optimization with Deterministic Regret Bounds

Lyu, Yueming, Yuan, Yuan, Tsang, Ivor W.

arXiv.org Machine LearningMay-24-2019

In this work, we investigate black-box optimization from the perspective of frequentist kernel methods. We propose a novel batch optimization algorithm to jointly maximize the acquisition function and select points from a whole batch in a holistic way. Theoretically, we derive regret bounds for both the noise-free and perturbation settings. Moreover, we analyze the property of the adversarial regret that is required by robust initialization for Bayesian Optimization (BO), and prove that the adversarial regret bounds decrease with the decrease of covering radius, which provides a criterion for generating (initialization point set) to minimize the bound. We then propose fast searching algorithms to generate a point set with a small covering radius for the robust initialization. Experimental results on both synthetic benchmark problems and real-world problems show the effectiveness of the proposed algorithms.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

1905.10041

Country: