AITopics

2604.07267

Country:

Europe > United Kingdom (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Papagiannouli, Katerina, Trevisan, Dario, Zitto, Giuseppe Pio

Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks

arXiv.org Machine LearningFeb-27-2026

We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly at the functional level. We show that the posterior output rate function is obtained by a joint optimization over predictors and internal kernels, in contrast with fixed-kernel (NNGP) theory. Numerical experiments demonstrate that the resulting predictions accurately describe finite-width behavior for moderately sized networks, capturing non-Gaussian tails, posterior deformation, and data-dependent kernel selection effects.

artificial intelligence, machine learning, rate function, (16 more...)

2602.22925

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Neural Information Processing SystemsFeb-7-2026, 10:35:48 GMT

SupplementaryMaterial: Appendix BayesianDeepEnsemblesviatheNeuralTangentKernel ARecapofstandardandNTKparameterisations

We see that the different parameterisations yield the same distribution for the functional output f(,θ)atinitialisation, butgivedifferent scalings tothe parameter gradients inthe backward pass. GP(0,Θ L) and is independent off0() in the infinite width limit. Let X0 be an arbitrary test set. In fact, even with a heteroscedastic priorθ N(0,Λ) with a diagonal matrix Λ Rp p+ and diagonal entries {λj}pj=1, it is straightforward to show that the correct setting of regularisation iskθk2Λ = θ>Λ 1θ in order to obtain a posterior sample of θ. For an NN in the linearised regime [23], this is related to the fact that the NTK and standard parameterisations initialise parameters differently, yet yield the same functional distribution for a randomly initialised NN.

artificial intelligence, machine learning, parameterisation, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Caporali, Francesco, Favaro, Stefano, Trevisan, Dario

Student-t processes as infinite-width limits of posterior Bayesian neural networks

arXiv.org Machine LearningFeb-6-2025

The asymptotic properties of Bayesian Neural Networks (BNNs) have been extensively studied, particularly regarding their approximations by Gaussian processes in the infinite-width limit. We extend these results by showing that posterior BNNs can be approximated by Student-t processes, which offer greater flexibility in modeling uncertainty. Specifically, we show that, if the parameters of a BNN follow a Gaussian prior distribution, and the variance of both the last hidden layer and the Gaussian likelihood function follows an Inverse-Gamma prior distribution, then the resulting posterior BNN converges to a Student-t process in the infinite-width limit. Our proof leverages the Wasserstein metric to establish control over the convergence rate of the Student-t process approximation.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2502.04247

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Feng, Brandon R., Majumder, Reetam, Reich, Brian J., Abba, Mohamed A.

Amortized Bayesian Local Interpolation NetworK: Fast covariance parameter estimation for Gaussian Processes

arXiv.org Machine LearningNov-9-2024

Gaussian processes (GPs) are a ubiquitous tool for geostatistical modeling with high levels of flexibility and interpretability, and the ability to make predictions at unseen spatial locations through a process called Kriging. Estimation of Kriging weights relies on the inversion of the process' covariance matrix, creating a computational bottleneck for large spatial datasets. In this paper, we propose an Amortized Bayesian Local Interpolation NetworK (A-BLINK) for fast covariance parameter estimation, which uses two pre-trained deep neural networks to learn a mapping from spatial location coordinates and covariance function parameters to Kriging weights and the spatial variance, respectively. The fast prediction time of these networks allows us to bypass the matrix inversion step, creating large computational speedups over competing methods in both frequentist and Bayesian settings, and also provides full posterior inference and predictions using Markov chain Monte Carlo sampling methods. We show significant increases in computational efficiency over comparable scalable GP methodology in an extensive simulation study with lower parameter estimation error. The efficacy of our approach is also demonstrated using a temperature dataset of US climate normals for 1991--2020 based on over 7,000 weather stations.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2411.06324

Country: North America > United States (0.14)

Genre: Research Report (0.65)

Industry: Energy > Oil & Gas > Upstream (0.91)

arXiv.org Artificial IntelligenceMar-26-2024

A Unified Kernel for Neural Network Learning

Zhang, Shao-Qun, Chen, Zong-Yi, Tian, Yong-Ming, Lu, Xun

Past decades have witnessed a great interest in the distinction and connection between neural network learning and kernel learning. Recent advancements have made theoretical progress in connecting infinite-wide neural networks and Gaussian processes. Two predominant approaches have emerged: the Neural Network Gaussian Process (NNGP) and the Neural Tangent Kernel (NTK). The former, rooted in Bayesian inference, represents a zero-order kernel, while the latter, grounded in the tangent space of gradient descents, is a first-order kernel. In this paper, we present the Unified Neural Kernel (UNK), which characterizes the learning dynamics of neural networks with gradient descents and parameter initialization. The proposed UNK kernel maintains the limiting properties of both NNGP and NTK, exhibiting behaviors akin to NTK with a finite learning step and converging to NNGP as the learning step approaches infinity. Besides, we also theoretically characterize the uniform tightness and learning convergence of the UNK kernel, providing comprehensive insights into this unified kernel. Experimental results underscore the effectiveness of our proposed method.

denote, kernel, neural network, (11 more...)

2403.17467

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Bigi, Filippo, Chong, Sanggyu, Ceriotti, Michele, Grasselli, Federico

A prediction rigidity formalism for low-cost uncertainties in trained neural networks

arXiv.org Machine LearningMar-4-2024

Regression methods are fundamental for scientific and technological applications. However, fitted models can be highly unreliable outside of their training domain, and hence the quantification of their uncertainty is crucial in many of their applications. Based on the solution of a constrained optimization problem, we propose "prediction rigidities" as a method to obtain uncertainties of arbitrary pre-trained regressors. We establish a strong connection between our framework and Bayesian inference, and we develop a last-layer approximation that allows the new method to be applied to neural networks. This extension affords cheap uncertainties without any modification to the neural network itself or its training procedure. We show the effectiveness of our method on a wide range of regression tasks, ranging from simple toy models to applications in chemistry and meteorology.

approximation, low-cost uncertainty, neural network, (14 more...)

2403.02251

Country:

North America > United States > California (0.05)
Oceania > Australia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Del Debbio, Luigi, Naviglio, Manuel, Tarantelli, Francesco

Neural Networks Asymptotic Behaviours for the Resolution of Inverse Problems

arXiv.org Artificial IntelligenceFeb-15-2024

This paper presents a study of the effectiveness of Neural Network (NN) techniques for deconvolution inverse problems relevant for applications in Quantum Field Theory, but also in more general contexts. We consider NN's asymptotic limits, corresponding to Gaussian Processes (GPs), where non-linearities in the parameters of the NN can be neglected. Using these resulting GPs, we address the deconvolution inverse problem in the case of a quantum harmonic oscillator simulated through Monte Carlo techniques on a lattice. In this simple toy model, the results of the inversion can be compared with the known analytical solution. Our findings indicate that solving the inverse problem with a NN yields less performing results than those obtained using the GPs derived from NN's asymptotic limits. Furthermore, we observe the trained NN's accuracy approaching that of GPs with increasing layer width. Notably, one of these GPs defies interpretation as a probabilistic model, offering a novel perspective compared to established methods in the literature. Our results suggest the need for detailed studies of the training dynamics in more realistic set-ups.

arxiv, neural network, spectral function, (15 more...)

2402.09338

Country:

Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
South America > Suriname > Marowijne District > Albina (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Anson, Ben, Milsom, Edward, Aitchison, Laurence

Flexible infinite-width graph convolutional networks and the importance of representation learning

arXiv.org Artificial IntelligenceFeb-9-2024

A common theoretical approach to understanding neural networks is to take an infinite-width limit, at which point the outputs become Gaussian process (GP) distributed. This is known as a neural network Gaussian process (NNGP). However, the NNGP kernel is fixed, and tunable only through a small number of hyperparameters, eliminating any possibility of representation learning. This contrasts with finite-width NNs, which are often believed to perform well precisely because they are able to learn representations. Thus in simplifying NNs to make them theoretically tractable, NNGPs may eliminate precisely what makes them work well (representation learning). This motivated us to understand whether representation learning is necessary in a range of graph classification tasks. We develop a precise tool for this task, the graph convolutional deep kernel machine. This is very similar to an NNGP, in that it is an infinite width limit and uses kernels, but comes with a `knob' to control the amount of representation learning. We found that representation learning is necessary (in the sense that it gives dramatic performance improvements) in graph classification tasks and heterophilous node classification tasks, but not in homophilous node classification tasks.

dataset, flexible infinite-width graph convolutional network, representation, (10 more...)

2402.06525

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Kelly, Bryan, Kuznetsov, Boris, Malamud, Semyon, Xu, Teng Andrea

Large (and Deep) Factor Models

arXiv.org Artificial IntelligenceJan-20-2024

We open up the black box behind Deep Learning for portfolio optimization and prove that a sufficiently wide and arbitrarily deep neural network (DNN) trained to maximize the Sharpe ratio of the Stochastic Discount Factor (SDF) is equivalent to a large factor model (LFM): A linear factor pricing model that uses many non-linear characteristics. The nature of these characteristics depends on the architecture of the DNN in an explicit, tractable fashion. This makes it possible to derive end-to-end trained DNN-based SDFs in closed form for the first time. We evaluate LFMs empirically and show how various architectural choices impact SDF performance. We document the virtue of depth complexity: With enough data, the out-of-sample performance of DNN-SDF is increasing in the NN depth, saturating at huge depths of around 100 hidden layers.

kernel portfolio, neural network, portfolio, (12 more...)

2402.06635

Genre: Research Report (0.82)

Industry: Banking & Finance > Trading (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)