initialisation
S)GD over Diagonal Linear Networks Implicit Bias Large and Edge of Stability
Currently, most theoretical works on implicit regularisation have primarily focused on continuous time approximations of (S)GD where the impact of crucial hyperparameters such as the stepsize and the minibatch size are ignored. One such common simplification is to analyse gradient flow, which is a continuous time limit of GD and minibatch SGD with an infinitesimal stepsize. By definition, this analysis does not capture the effect of stepsize or stochasticity.
- North America > United States > Washington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- (2 more...)
- North America > United States > Washington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- (2 more...)
- North America > United States (0.14)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
An Empirical Study on Ensemble-Based Transfer Learning Bayesian Optimisation with Mixed Variable Types
Trinkle, Natasha, Ha, Huong, Chan, Jeffrey
Bayesian optimisation is a sample efficient method for finding a global optimum of expensive black-box objective functions. Historic datasets from related problems can be exploited to help improve performance of Bayesian optimisation by adapting transfer learning methods to various components of the Bayesian optimisation pipeline. In this study we perform an empirical analysis of various ensemble-based transfer learning Bayesian optimisation methods and pipeline components. We expand on previous work in the literature by contributing some specific pipeline components, and three new real-time transfer learning Bayesian optimisation benchmarks. In particular we propose to use a weighting strategy for ensemble surrogate model predictions based on regularised regression with weights constrained to be positive, and a related component for handling the case when transfer learning is not improving Bayesian optimisation performance. We find that in general, two components that help improve transfer learning Bayesian optimisation performance are warm start initialisation and constraining weights used with ensemble surrogate model to be positive.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California (0.04)
- Europe > Italy (0.04)
- (5 more...)
- Health & Medicine (0.46)
- Energy (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Critical initialisation for deep signal propagation in noisy rectifier neural networks
Stochastic regularisation is an important weapon in the arsenal of a deep learning practitioner. However, despite recent theoretical advances, our understanding of how noise influences signal propagation in deep neural networks remains limited. By extending recent work based on mean field theory, we develop a new framework for signal propagation in stochastic regularised neural networks. Our \textit{noisy signal propagation} theory can incorporate several common noise distributions, including additive and multiplicative Gaussian noise as well as dropout. We use this framework to investigate initialisation strategies for noisy ReLU networks. We show that no critical initialisation strategy exists using additive noise, with signal propagation exploding regardless of the selected noise distribution. For multiplicative noise (e.g.\ dropout), we identify alternative critical initialisation strategies that depend on the second moment of the noise distribution. Simulations and experiments on real-world data confirm that our proposed initialisation is able to stably propagate signals in deep networks, while using an initialisation disregarding noise fails to do so.
A new initialisation to Control Gradients in Sinusoidal Neural network
Combette, Andrea, Venaille, Antoine, Pustelnik, Nelly
Proper initialisation strategy is of primary importance to mitigate gradient explosion or vanishing when training neural networks. Yet, the impact of initialisation parameters still lacks a precise theoretical understanding for several well-established architectures. Here, we propose a new initialisation for networks with sinusoidal activation functions such as \texttt{SIREN}, focusing on gradients control, their scaling with network depth, their impact on training and on generalization. To achieve this, we identify a closed-form expression for the initialisation of the parameters, differing from the original \texttt{SIREN} scheme. This expression is derived from fixed points obtained through the convergence of pre-activation distribution and the variance of Jacobian sequences. Controlling both gradients and targeting vanishing pre-activation helps preventing the emergence of inappropriate frequencies during estimation, thereby improving generalization. We further show that this initialisation strongly influences training dynamics through the Neural Tangent Kernel framework (NTK). Finally, we benchmark \texttt{SIREN} with the proposed initialisation against the original scheme and other baselines on function fitting and image reconstruction. The new initialisation consistently outperforms state-of-the-art methods across a wide range of reconstruction tasks, including those involving physics-informed neural networks.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Texas > Howard County (0.04)
- Europe > Switzerland (0.04)
- (2 more...)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)