AITopics | regulariser

Wide neural networks in the feature-learning regime drive modern deep learning, and yet they remain far less studied than their kernel-regime counterparts. We consider a critical yet under-explored difference between these two regimes: the regulariser and prior implied by gradient flow training. This canonical regularisation property is well-studied in kernel regime networks -- of all the infinite global minima, gradient flow selects exactly the vanishing ridge solution -- and underpins the celebrated NN-GP correspondence, precisely allowing the modelling of noise during training. However, we prove ridge regularisation biases gradient flow in feature-learning regime networks, even in the infinitesimal limit of vanishing regularisation. Over training, ridge distorts the inductive bias of the network, with a particular damage done to pretrained networks where the implicit prior is informative. We resolve this by axiomatising the canonical regulariser as a regime-agnostic function-space energy and lift, which uniquely identifies ridge in the kernel regime, and crucially generalises to the feature-learning regime. By studying the Riemannian geometry of feature-learning networks, we derive geodesic ridge from our framework, generalising ridge to the feature-learning regime. Correspondingly, we prove the canonical function-space prior is a Riemannian Gibbs Process, generalising the more familiar Gaussian Process. As a practical contribution, we propose arc ridge as a minimax-robust, scalable surrogate to geodesic ridge, revealing a deep relationship between early stopping and canonical regularisation across learning regimes. Finally, we demonstrate the consequences of our theory empirically on both image processing and NLP transfer-learning problems.

artificial intelligence, machine learning, mflow, (19 more...)

arXiv.org Machine Learning

2605.1818

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Learning to Learn Graph Topologies

Neural Information Processing SystemsApr-25-2026, 02:26:31 GMT

Learning a graph topology to reveal the underlying relationship between data entities plays an important role in various machine learning and data analysis tasks. Under the assumption that structured data vary smoothly over a graph, the problem can be formulated as a regularised convex optimisation over a positive semidefinite cone and solved by iterative algorithms. Classic methods require an explicit convex function to reflect generic topological priors, e.g. the `1 penalty for enforcing sparsity, which limits the flexibility and expressiveness in learning rich topological structures. We propose to learn a mapping from node data to the graph structure based on the idea of learning to optimise (L2O). Specifically, our model first unrolls an iterative primal-dual splitting algorithm into a neural network. The key structural proximal projection is replaced with a variational autoencoder that refines the estimated graph with enhanced topological properties. The model is trained in an end-to-end fashion with pairs of node data and graph samples. Experiments on both synthetic and real-world data demonstrate that our model is more efficient than classic iterative algorithms in learning a graph with specific topological properties.

artificial intelligence, graph, machine learning, (15 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

d6428eecbe0f7dff83fc607c5044b2b9-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 09:07:30 GMT

completion, matrix completion, side information, (12 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
North America > United States (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Communications (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

d15426b9c324676610fbb01360473ed8-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 12:02:25 GMT

information, mutual information, transformation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Switzerland > Basel-City > Basel (0.04)
North America > Canada (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

67b0579a7298d9cf39c59404d867bdd7-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 13:16:17 GMT

gradient, neural network, regularisation, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

2f3bbb9730639e9ea48f309d9a79ff01-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 23:42:54 GMT

approximation, learning, memorable example, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
(5 more...)

Industry:

Education (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Learn Graph Topologies

Neural Information Processing SystemsFeb-7-2026, 20:25:36 GMT

We propose to learn a mapping from node data to the graph structure based on the idea of learning to optimise (L2O).

artificial intelligence, graph, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Modular connectivity in neural networks emerges from Poisson noise-motivated regularisation, and promotes robustness and compositional generalisation

Qian, Daoyuan, Liang, Qiyao, Fiete, Ila

arXiv.org Machine LearningDec-17-2025

Circuits in the brain commonly exhibit modular architectures that factorise complex tasks, resulting in the ability to compositionally generalise and reduce catastrophic forgetting. In contrast, artificial neural networks (ANNs) appear to mix all processing, because modular solutions are difficult to find as they are vanishing subspaces in the space of possible solutions. Here, we draw inspiration from fault-tolerant computation and the Poisson-like firing of real neurons to show that activity-dependent neural noise, combined with nonlinear neural responses, drives the emergence of solutions that reflect an accurate understanding of modular tasks, corresponding to acquisition of a correct world model. We find that noise-driven modularisation can be recapitulated by a deterministic regulariser that multiplicatively combines weights and activations, revealing rich phenomenology not captured in linear networks or by standard regularisation methods. Though the emergence of modular structure requires sufficiently many training samples (exponential in the number of modular task dimensions), we show that pre-modularised ANNs exhibit superior noise-robustness and the ability to generalise and extrapolate well beyond training data, compared to ANNs without such inductive biases. Together, our work demonstrates a regulariser and architectures that could encourage modularity emergence to yield functional benefits.

connectivity, noise, regulariser, (16 more...)

arXiv.org Machine Learning

2512.13707

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On the necessity of adaptive regularisation:Optimal anytime online learning on $\boldsymbol{\ell_p}$-balls

Johnson, Emmeran, Martínez-Rubio, David, Pike-Burke, Ciara, Rebeschini, Patrick

arXiv.org Artificial IntelligenceDec-1-2025

We study online convex optimization on $\ell_p$-balls in $\mathbb{R}^d$ for $p > 2$. While always sub-linear, the optimal regret exhibits a shift between the high-dimensional setting ($d > T$), when the dimension $d$ is greater than the time horizon $T$ and the low-dimensional setting ($d \leq T$). We show that Follow-the-Regularised-Leader (FTRL) with time-varying regularisation which is adaptive to the dimension regime is anytime optimal for all dimension regimes. Motivated by this, we ask whether it is possible to obtain anytime optimality of FTRL with fixed non-adaptive regularisation. Our main result establishes that for separable regularisers, adaptivity in the regulariser is necessary, and that any fixed regulariser will be sub-optimal in one of the two dimension regimes. Finally, we provide lower bounds which rule out sub-linear regret bounds for the linear bandit problem in sufficiently high-dimension for all $\ell_p$-balls with $p \geq 1$.

data mining, machine learning, regulariser, (20 more...)

arXiv.org Artificial Intelligence

2506.19752

Country: Europe (0.27)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Online (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.66)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.40)

Add feedback

AT Proofs

Neural Information Processing SystemsNov-15-2025, 06:17:28 GMT

A.1 Proof of Proposition 1 Proof of Proposition 1. Recall that h denotes the vanilla activations of the network, those we obtain with no noise injection. Let us not inject noise in the final, predictive, layer of our network such that the noise on this layer is accumulated from the noising of previous layers. Let us first consider the Taylor series expansion of the loss function with the accumulated noise defined in Proposition 1. Denoting =[ This can be deduced from the slightly opaque Fa ` a di Bruno's formula, which states that for multivariate derivatives of a composition of functions f: R The final equality comes from the moments of a mean 0 Gaussian, where j takes the values of the multi-index. Though these equalities can already offer insight into the regularising mechanisms of GNIs, they are not easy to work with and will often be computationally intractable. We will include these terms in our remainder term C .

baseline, noise, regulariser, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Filters

Collaborating Authors

regulariser

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Canonical Regularisation of Wide Feature-Learning Neural Networks

Learning to Learn Graph Topologies

d6428eecbe0f7dff83fc607c5044b2b9-Paper.pdf

d15426b9c324676610fbb01360473ed8-Paper.pdf

67b0579a7298d9cf39c59404d867bdd7-Paper-Conference.pdf

2f3bbb9730639e9ea48f309d9a79ff01-Paper.pdf

Learning to Learn Graph Topologies

Modular connectivity in neural networks emerges from Poisson noise-motivated regularisation, and promotes robustness and compositional generalisation

On the necessity of adaptive regularisation:Optimal anytime online learning on $\boldsymbol{\ell_p}$-balls

AT Proofs