initialisation
Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks
Huang, Jie, Loureiro, Bruno, Mannelli, Stefano Sarao
We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical structure of minima: they are typically isolated in the well-specified regime, but become connected by flat directions as network width increases. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations of common simplifying assumptions, which may miss essential features of the loss landscape even in minimal neural network models.
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Europe > France (0.04)
- Asia > Singapore (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
tBayes-MICE: A Bayesian Approach to Multiple Imputation for Time Series Data
Ibenegbu, Amuche, de Micheaux, Pierre Lafaye, Chandra, Rohitash
Time-series analysis is often affected by missing data, a common problem across several fields, including healthcare and environmental monitoring. Multiple Imputation by Chained Equations (MICE) has been prominent for imputing missing values through "fully conditional specification". We extend MICE using the Bayesian framework (tBayes-MICE), utilising Bayesian inference to impute missing values via Markov Chain Monte Carlo (MCMC) sampling to account for uncertainty in MICE model parameters and imputed values. We also include temporally informed initialisation and time-lagged features in the model to respect the sequential nature of time-series data. We evaluate the tBayes-MICE method using two real-world datasets (AirQuality and PhysioNet), and using both the Random Walk Metropolis (RWM) and the Metropolis-Adjusted Langevin Algorithm (MALA) samplers. Our results demonstrate that tBayes-MICE reduces imputation errors relative to the baseline methods over all variables and accounts for uncertainty in the imputation process, thereby providing a more accurate measure of imputation error. We also found that MALA mixed better than RWM across most variables, achieving comparable accuracy while providing more consistent posterior exploration. Overall, these findings suggest that the tBayes-MICE framework represents a practical and efficient approach to time-series imputation, balancing increased accuracy with meaningful quantification of uncertainty in various environmental and clinical settings.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Oceania > Australia > New South Wales (0.04)
- Europe > Italy (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
Critical initialisation for deep signal propagation in noisy rectifier neural networks
Stochastic regularisation is an important weapon in the arsenal of a deep learning practitioner. However, despite recent theoretical advances, our understanding of how noise influences signal propagation in deep neural networks remains limited. By extending recent work based on mean field theory, we develop a new framework for signal propagation in stochastic regularised neural networks. Our \textit{noisy signal propagation} theory can incorporate several common noise distributions, including additive and multiplicative Gaussian noise as well as dropout. We use this framework to investigate initialisation strategies for noisy ReLU networks. We show that no critical initialisation strategy exists using additive noise, with signal propagation exploding regardless of the selected noise distribution. For multiplicative noise (e.g.\ dropout), we identify alternative critical initialisation strategies that depend on the second moment of the noise distribution. Simulations and experiments on real-world data confirm that our proposed initialisation is able to stably propagate signals in deep networks, while using an initialisation disregarding noise fails to do so.
- Asia > Middle East > Jordan (0.04)
- North America > Dominican Republic (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Energy (0.46)
- Education > Educational Setting (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (4 more...)
S)GD over Diagonal Linear Networks Implicit Bias Large and Edge of Stability
Currently, most theoretical works on implicit regularisation have primarily focused on continuous time approximations of (S)GD where the impact of crucial hyperparameters such as the stepsize and the minibatch size are ignored. One such common simplification is to analyse gradient flow, which is a continuous time limit of GD and minibatch SGD with an infinitesimal stepsize. By definition, this analysis does not capture the effect of stepsize or stochasticity.
- North America > United States > Washington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- (2 more...)
- North America > United States > Washington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- (2 more...)