AITopics

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Neural Information Processing SystemsMar-17-2026, 00:36:23 GMT

Scaling Gaussian Process Regression with Derivatives

Gaussian processes (GPs) with derivatives are useful in many applications, including Bayesian optimization, implicit surface reconstruction, and terrain reconstruction. Fitting a GP to function values and derivatives at $n$ points in $d$ dimensions requires linear solves and log determinants with an ${n(d+1) \times n(d+1)}$ positive definite matrix-- leading to prohibitive $\mathcal{O}(n^3d^3)$ computations for standard direct methods. We propose iterative solvers using fast $\mathcal{O}(nd)$ matrix-vector multiplications (MVMs), together with pivoted Cholesky preconditioning that cuts the iterations to convergence by several orders of magnitude, allowing for fast kernel learning and prediction. Our approaches, together with dimensionality reduction, allows us to scale Bayesian optimization with derivatives to high-dimensional problems and large evaluation budgets.

artificial intelligence, machine learning, proceedings, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.62)

Neural Information Processing SystemsMar-17-2026, 00:03:30 GMT

Mental Sampling in Multimodal Representations

Both resources in the natural environment and concepts in a semantic space are distributed patchily, with large gaps in between the patches. To describe people's internal and external foraging behavior, various random walk models have been proposed. In particular, internal foraging has been modeled as sampling: in order to gather relevant information for making a decision, people draw samples from a mental representation using random-walk algorithms such as Markov chain Monte Carlo (MCMC). However, two common empirical observations argue against people using simple sampling algorithms such as MCMC for internal foraging. First, the distance between samples is often best described by a Levy flight distribution: the probability of the distance between two successive locations follows a power-law on the distances.

artificial intelligence, machine learning, proceedings, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)

Neural Information Processing SystemsMar-17-2026, 00:03:20 GMT

Sparsified SGD with Memory

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e. algorithms that leverage the compute power of many devices for training. The communication overhead is a key bottleneck that hinders perfect scalability. Various recent works proposed to use quantization or sparsification techniques to reduce the amount of data that needs to be communicated, for instance by only sending the most significant entries of the stochastic gradient (top-k sparsification). Whilst such schemes showed very promising performance in practice, they have eluded theoretical analysis so far. In this work we analyze Stochastic Gradient Descent (SGD) with k-sparsification or compression (for instance top-k or random-k) and show that this scheme converges at the same rate as vanilla SGD when equipped with error compensation (keeping track of accumulated errors in memory). That is, communication can be reduced by a factor of the dimension of the problem (sometimes even more) whilst still converging at the same rate. We present numerical experiments to illustrate the theoretical findings and the good scalability for distributed applications.

artificial intelligence, machine learning, proceedings, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.84)

Pashmchi, Parastoo, Benoit, Jérôme, Kanagawa, Motonobu

Predictive Uncertainty in Short-Term PV Forecasting under Missing Data: A Multiple Imputation Approach

Missing values are common in photovoltaic (PV) power data, yet the uncertainty they induce is not propagated into predictive distributions. We develop a framework that incorporates missing-data uncertainty into short-term PV forecasting by combining stochastic multiple imputation with Rubin's rule. The approach is model-agnostic and can be integrated with standard machine-learning predictors. Empirical results show that ignoring missing-data uncertainty leads to overly narrow prediction intervals. Accounting for this uncertainty improves interval calibration while maintaining comparable point prediction accuracy. These results demonstrate the importance of propagating imputation uncertainty in data-driven PV forecasting.

data quality, imputation, machine learning, (19 more...)

2603.15564

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.05)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report > New Finding (0.87)

Industry: Energy > Renewable > Solar (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Quality (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Cheng, Zilan, Wang, Li-Lian, Wang, Zhongjian

Preconditioned One-Step Generative Modeling for Bayesian Inverse Problems in Function Spaces

We propose a machine-learning algorithm for Bayesian inverse problems in the function-space regime based on one-step generative transport. Building on the Mean Flows, we learn a fully conditional amortized sampler with a neural-operator backbone that maps a reference Gaussian noise to approximate posterior samples. We show that while white-noise references may be admissible at fixed discretization, they become incompatible with the function-space limit, leading to instability in inference for Bayesian problems arising from PDEs. To address this issue, we adopt a prior-aligned anisotropic Gaussian reference distribution and establish the Lipschitz regularity of the resulting transport. Our method is not distilled from MCMC: training relies only on prior samples and simulated partial and noisy observations. Once trained, it generates a $64\times64$ posterior sample in $\sim 10^{-3}$s, avoiding the repeated PDE solves of MCMC while matching key posterior summaries.

artificial intelligence, machine learning, yob, (15 more...)

2603.14798

Country: Asia > Singapore (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Burfitt, Matthew, Brodzki, Jacek, Dłotko, Pawel

Understanding the geometry of deep learning with decision boundary volume

For classification tasks, the performance of a deep neural network is determined by the structure of its decision boundary, whose geometry directly affects essential properties of the model, including accuracy and robustness. Motivated by a classical tube formula due to Weyl, we introduce a method to measure the decision boundary of a neural network through local surface volumes, providing a theoretically justifiable and efficient measure enabling a geometric interpretation of the effectiveness of the model applicable to the high dimensional feature spaces considered in deep learning. A smaller surface volume is expected to correspond to lower model complexity and better generalisation. We verify, on a number of image processing tasks with convolutional architectures that decision boundary volume is inversely proportional to classification accuracy. Meanwhile, the relationship between local surface volume and generalisation for fully connected architecture is observed to be less stable between tasks. Therefore, for network architectures suited to a particular data structure, we demonstrate that smoother decision boundaries lead to better performance, as our intuition would suggest.

artificial intelligence, decision boundary, machine learning, (19 more...)

2603.14768

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Basel-City > Basel (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry:

Government (0.69)
Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Robust Sequential Tracking via Bounded Information Geometry and Non-Parametric Field Actions

Rodriguez, Carlos C.

Standard sequential inference architectures are compromised by a normalizability crisis when confronted with extreme, structured outliers. By operating on unbounded parameter spaces, state-of-the-art estimators lack the intrinsic geometry required to appropriately sever anomalies, resulting in unbounded covariance inflation and mean divergence. This paper resolves this structural failure by analyzing the abstraction sequence of inference at the meta-prior level (S_2). We demonstrate that extremizing the action over an infinite-dimensional space requires a non-parametric field anchored by a pre-prior, as a uniform volume element mathematically does not exist. By utilizing strictly invariant Delta (or ν) Information Separations on the statistical manifold, we physically truncate the infinite tails of the spatial distribution. When evaluated as a Radon-Nikodym derivative against the base measure, the active parameter space compresses into a strictly finite, normalizable probability droplet. Empirical benchmarks across three domains--LiDAR maneuvering target tracking, high-frequency cryptocurrency order flow, and quantum state tomography--demonstrate that this bounded information geometry analytically truncates outliers, ensuring robust estimation without relying on infinite-tailed distributional assumptions.

artificial intelligence, machine learning, manifold, (14 more...)

2603.13613

Country: North America > United States > New York (0.04)

Genre: Research Report (0.40)

Industry: Banking & Finance > Trading (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)

Paquette, Elliot, Xiao, Ke Liang, Zhu, Yizhe

Power-Law Spectrum of the Random Feature Model

Scaling laws for neural networks, in which the loss decays as a power-law in the number of parameters, data, and compute, depend fundamentally on the spectral structure of the data covariance, with power-law eigenvalue decay appearing ubiquitously in vision and language tasks. A central question is whether this spectral structure is preserved or destroyed when data passes through the basic building block of a neural network: a random linear projection followed by a nonlinear activation. We study this question for the random feature model: given data $x \sim N(0,H)\in \mathbb{R}^v$ where $H$ has $α$-power-law spectrum ($λ_j(H ) \asymp j^{-α}$, $α> 1$), a Gaussian sketch matrix $W \in \mathbb{R}^{v\times d}$, and an entrywise monomial $f(y) = y^{p}$, we characterize the eigenvalues of the population random-feature covariance $\mathbb{E}_{x }[\frac{1}{d}f(W^\top x )^{\otimes 2}]$. We prove matching upper and lower bounds: for all $1 \leq j \leq c_1 d \log^{-(p+1)}(d)$, the $j$-th eigenvalue is of order $\left(\log^{p-1}(j+1)/j\right)^α$. For $ c_1 d \log^{-(p+1)}(d)\leq j\leq d$, the $j$-th eigenvalue is of order $j^{-α}$ up to a polylog factor. That is, the power-law exponent $α$ is inherited exactly from the input covariance, modified only by a logarithmic correction that depends on the monomial degree $p$. The proof combines a dyadic head-tail decomposition with Wick chaos expansions for higher-order monomials and random matrix concentration inequalities.

artificial intelligence, eigenvalue, machine learning, (16 more...)

2603.14578

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

BrainCast: A Spatio-Temporal Forecasting Model for Whole-Brain fMRI Time Series Prediction

Gao, Yunlong, Yang, Jinbo, Xiao, Li, Huo, Haiye, Ji, Yang, Wang, Hao, Zhang, Aiying, Wang, Yu-Ping

Functional magnetic resonance imaging (fMRI) enables noninvasive investigation of brain function, while short clinical scan durations, arising from human and non-human factors, usually lead to reduced data quality and limited statistical power for neuroimaging research. In this paper, we propose BrainCast, a novel spatio-temporal forecasting framework specifically tailored for whole-brain fMRI time series forecasting, to extend informative fMRI time series without additional data acquisition. It formulates fMRI time series forecasting as a multivariate time series prediction task and jointly models temporal dynamics within regions of interest (ROIs) and spatial interactions across ROIs. Specifically, BrainCast integrates a Spatial Interaction Awareness module to characterize inter-ROI dependencies via embedding every ROI time series as a token, a Temporal Feature Refinement module to capture intrinsic neural dynamics within each ROI by enhancing both low- and high-energy temporal components of fMRI time series at the ROI level, and a Spatio-temporal Pattern Alignment module to combine spatial and temporal representations for producing informative whole-brain features. Experimental results on resting-state and task fMRI datasets from the Human Connectome Project demonstrate the superiority of BrainCast over state-of-the-art time series forecasting baselines. Moreover, fMRI time series extended by BrainCast improve downstream cognitive ability prediction, highlighting the clinical and neuroscientific impact brought by whole-brain fMRI time series forecasting in scenarios with restricted scan durations.

artificial intelligence, data mining, machine learning, (17 more...)

2603.13361

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
Asia > China > Anhui Province > Hefei (0.05)
Asia > China > Jiangxi Province > Nanchang (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)