initialisation
Critical initialisation for deep signal propagation in noisy rectifier neural networks
Stochastic regularisation is an important weapon in the arsenal of a deep learning practitioner. However, despite recent theoretical advances, our understanding of how noise influences signal propagation in deep neural networks remains limited. By extending recent work based on mean field theory, we develop a new framework for signal propagation in stochastic regularised neural networks. Our \textit{noisy signal propagation} theory can incorporate several common noise distributions, including additive and multiplicative Gaussian noise as well as dropout. We use this framework to investigate initialisation strategies for noisy ReLU networks. We show that no critical initialisation strategy exists using additive noise, with signal propagation exploding regardless of the selected noise distribution. For multiplicative noise (e.g.\ dropout), we identify alternative critical initialisation strategies that depend on the second moment of the noise distribution. Simulations and experiments on real-world data confirm that our proposed initialisation is able to stably propagate signals in deep networks, while using an initialisation disregarding noise fails to do so.
- Asia > Middle East > Jordan (0.04)
- North America > Dominican Republic (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Energy (0.46)
- Education > Educational Setting (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
S)GD over Diagonal Linear Networks Implicit Bias Large and Edge of Stability
Currently, most theoretical works on implicit regularisation have primarily focused on continuous time approximations of (S)GD where the impact of crucial hyperparameters such as the stepsize and the minibatch size are ignored. One such common simplification is to analyse gradient flow, which is a continuous time limit of GD and minibatch SGD with an infinitesimal stepsize. By definition, this analysis does not capture the effect of stepsize or stochasticity.
- North America > United States > Washington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- (2 more...)
- North America > United States > Washington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- (2 more...)
- North America > United States (0.14)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)