AITopics | initialisation

Collaborating Authors

initialisation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Canonical Regularisation of Wide Feature-Learning Neural Networks

Whittle, George, Vaidhyanathan, Pranav, Ziomek, Juliusz, Ares, Natalia, Osborne, Maike A.

arXiv.org Machine LearningMay-19-2026

Wide neural networks in the feature-learning regime drive modern deep learning, and yet they remain far less studied than their kernel-regime counterparts. We consider a critical yet under-explored difference between these two regimes: the regulariser and prior implied by gradient flow training. This canonical regularisation property is well-studied in kernel regime networks -- of all the infinite global minima, gradient flow selects exactly the vanishing ridge solution -- and underpins the celebrated NN-GP correspondence, precisely allowing the modelling of noise during training. However, we prove ridge regularisation biases gradient flow in feature-learning regime networks, even in the infinitesimal limit of vanishing regularisation. Over training, ridge distorts the inductive bias of the network, with a particular damage done to pretrained networks where the implicit prior is informative. We resolve this by axiomatising the canonical regulariser as a regime-agnostic function-space energy and lift, which uniquely identifies ridge in the kernel regime, and crucially generalises to the feature-learning regime. By studying the Riemannian geometry of feature-learning networks, we derive geodesic ridge from our framework, generalising ridge to the feature-learning regime. Correspondingly, we prove the canonical function-space prior is a Riemannian Gibbs Process, generalising the more familiar Gaussian Process. As a practical contribution, we propose arc ridge as a minimax-robust, scalable surrogate to geodesic ridge, revealing a deep relationship between early stopping and canonical regularisation across learning regimes. Finally, we demonstrate the consequences of our theory empirically on both image processing and NLP transfer-learning problems.

artificial intelligence, machine learning, mflow, (19 more...)

arXiv.org Machine Learning

2605.1818

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

17a9ab4190289f0e1504bbb98d1d111a-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 08:29:03 GMT

artificial intelligence, iterate, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

2b3bb2c95195130977a51b3bb251c40a-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 06:15:05 GMT

artificial intelligence, machine learning, representation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.46)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

5 Supplementary Material

Neural Information Processing SystemsApr-24-2026, 08:14:54 GMT

Dendritic updates Complete versions of the dendritic update rules (summarised in Eqns (2) & (3)) are given below. This is valid in our regime where the environmental latent updates slowly compared to neural timescales. The notation we're using admits the possible presence of biases as well as the weights (though biases typically aren't used) by assuming a row of constant 1's could be added to the synaptic inputs effectively absorbing a bias into the weight matrix without loss of generality, for example wgB p(t) wgB p(t)+ bgB . Somatic updates Somatic updates rules (Eqns (4) & (5)) and are repeated here for completeness: p(t)= (t)pB(t)+(1 (t))pA(t) g(t)= (t)gB(t)+(1 (t))gA(t). Update ordering For this hierarchical network of multicompartmental neurons we must specify the order in which we perform these discrete updates to the different layers and the different compartments within these layers.

artificial intelligence, machine learning, neuron, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

02dd0db10c40092de3d9ec2508d12f60-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 06:00:14 GMT

artificial intelligence, machine learning, singular value, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec (0.29)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Communications > Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Cognitive Science (0.69)

Add feedback

Learning better with Dale's Law: ASpectral Perspective

Neural Information Processing SystemsApr-24-2026, 06:00:10 GMT

Most recurrent neural networks (RNNs) do not include a fundamental constraint of real neural circuits: Dale's Law, which implies that neurons must be excitatory (E) or inhibitory (I). Dale's Law is generally absent from RNNs because simply partitioning a standard network's units into E and I populations impairs learning. However, here we extend a recent feedforward bio-inspired EI network architecture, named Dale's ANNs, to recurrent networks, and demonstrate that good performance is possible while respecting Dale's Law. This begs the question: What makes some forms of EI network learn poorly and others learn well? And, why does the simple approach of incorporating Dale's Law impair learning? Historically the answer was thought to be the sign constraints on EI network parameters, and this was a motivation behind Dale's ANNs. However, here we show the spectral properties of the recurrent weight matrix at initialisation are more impactful on network performance than sign constraints. We find that simple EI partitioning results in a singular value distribution that is multimodal and dispersed, whereas standard RNNs have an unimodal, more clustered singular value distribution, as do recurrent Dale's ANNs. We also show that the spectral properties and performance of partitioned EI networks are worse for small networks with fewer I units, and we present normalised SVD entropy as a measure of spectrum pathology that correlates with performance.

artificial intelligence, machine learning, spectral property, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adaptive Neural Compilation

Rudy R. Bunel, Alban Desmaison, Pawan K. Mudigonda, Pushmeet Kohli, Philip Torr

Neural Information Processing SystemsApr-22-2026, 12:15:18 GMT

This paper proposes an adaptive neural-compilation framework to address the problem of learning efficient programs. Traditional code optimisation strategies used in compilers are based on applying pre-specified set of transformations that make the code faster to execute without changing its semantics. In contrast, our work involves adapting programs to make them more efficient while considering correctness only on a target input distribution. Our approach is inspired by the recent works on differentiable representations of programs. We show that it is possible to compile programs written in a low-level language to a differentiable representation. We also show how programs in this representation can be optimised to make them efficient on a target input distribution. Experimental results demonstrate that our approach enables learning specifically-tuned algorithms for given data distributions with a high success rate.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

Huang, Jie, Loureiro, Bruno, Mannelli, Stefano Sarao

arXiv.org Machine LearningApr-13-2026

We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical structure of minima: they are typically isolated in the well-specified regime, but become connected by flat directions as network width increases. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations of common simplifying assumptions, which may miss essential features of the loss landscape even in minimal neural network models.

artificial intelligence, co 1, machine learning, (18 more...)

arXiv.org Machine Learning

2604.09412

Country:

North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
Europe > France (0.04)
Asia > Singapore (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

tBayes-MICE: A Bayesian Approach to Multiple Imputation for Time Series Data

Ibenegbu, Amuche, de Micheaux, Pierre Lafaye, Chandra, Rohitash

arXiv.org Machine LearningApr-10-2026

Time-series analysis is often affected by missing data, a common problem across several fields, including healthcare and environmental monitoring. Multiple Imputation by Chained Equations (MICE) has been prominent for imputing missing values through "fully conditional specification". We extend MICE using the Bayesian framework (tBayes-MICE), utilising Bayesian inference to impute missing values via Markov Chain Monte Carlo (MCMC) sampling to account for uncertainty in MICE model parameters and imputed values. We also include temporally informed initialisation and time-lagged features in the model to respect the sequential nature of time-series data. We evaluate the tBayes-MICE method using two real-world datasets (AirQuality and PhysioNet), and using both the Random Walk Metropolis (RWM) and the Metropolis-Adjusted Langevin Algorithm (MALA) samplers. Our results demonstrate that tBayes-MICE reduces imputation errors relative to the baseline methods over all variables and accounts for uncertainty in the imputation process, thereby providing a more accurate measure of imputation error. We also found that MALA mixed better than RWM across most variables, achieving comparable accuracy while providing more consistent posterior exploration. Overall, these findings suggest that the tBayes-MICE framework represents a practical and efficient approach to time-series imputation, balancing increased accuracy with meaningful quantification of uncertainty in various environmental and clinical settings.

artificial intelligence, imputation, machine learning, (19 more...)

arXiv.org Machine Learning

2603.27142

Country: