AITopics

2606.1439

Country: Europe (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Heredge, Jamie, Villani, Mattia J., Deshpande, Pranav, Seshadri, Akshay, Kumar, Niraj

CRUMB: Efficient Prior Fitted Network Inference via Distributionally Matched Context Batching

arXiv.org Machine LearningJun-11-2026

Prior-fitted networks (PFNs) are a promising class of tabular foundation models that perform in-context learning, whereby the entire labelled training set is supplied as context, and predictions for test queries are produced in a single forward pass. However, the quadratically scaling self-attention mechanism in many PFN architectures makes inference prohibitive for very large training datasets. We propose CRUMB (Clustered Retrieval Using Minimised-MMD Batching), a three-stage inference wrapper that (i) clusters the test queries, (ii) selects a small, distributionally matched training subset for each cluster by greedily minimising the maximum mean discrepancy (MMD), and (iii) runs exact PFN inference on each reduced-context batch. CRUMB is architecture-agnostic and requires no retraining. On the 51-dataset TabArena benchmark, evaluated across three PFN architectures (TabPFNv2, TabICLv1, TabICLv2), we show that CRUMB outperforms similar state-of-the-art context selection strategies. We also show that CRUMB is resilient to covariate drift, as the MMD-minimisation step naturally helps align the training context distribution to match the current test batch distributions.

artificial intelligence, machine learning, test point, (18 more...)

2606.11473

Genre: Research Report > Experimental Study (1.00)

Industry:

Banking & Finance (0.46)
Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Benavides-Herrera, Pablo, Ruiz-Cruz, Riemann, Sánchez-Torres, Juan Diego

Sequential Minimal Optimization for $\varepsilon$-SVR with MAPE Loss and Sample-Dependent Box Constraints

arXiv.org Machine LearningMay-11-2026

We derive a Sequential Minimal Optimization (SMO) algorithm for the quadratic dual problem arising from $\varepsilon$-SVR~\cite{Vapnik1995, Drucker1997, Smola2004} modified to minimize the Mean Absolute Percentage Error (MAPE)~\cite{Makridakis1993, Hyndman2006} directly in the loss function~\cite{benavides2025support}. This formulation is part of a broader family of SVR models with percentage-error losses that also includes least-squares variants~\cite{Suykens2002} and symmetric-kernel extensions~\cite{Espinoza2005}, whose unified structure is studied in~\cite{benavides2026unified}. The key structural difference from standard $\varepsilon$-SVR is that the box constraints become \emph{sample-dependent}: $α_k, α_k^* \in [0,\, 100C/y_k]$. We show that this modification affects only (i) the feasibility sets $\Iup$ and $\Idown$ in the working-set selection and (ii) the clipping bounds in the analytic two-variable update, while leaving the curvature formula and gradient update structurally identical to the standard SMO~\cite{Platt1998, Platt1999, Fan2005}. A shrinking heuristic adapted to the sample-dependent bounds is derived and shown to introduce an asymmetry between $α$- and $α^*$-variables controlled by the gap $2y_k\varepsilon/100$. The same solver applies to the symmetric-kernel variant (m2) by replacing $Ω$ with $Ω_s = \tfrac{1}{2}(Ω+ aΩ^*)$~\cite{Espinoza2005}. Numerical validation against an interior-point QP reference solver confirms solution agreement to within solver termination tolerance across ten synthetic configurations spanning both kernel variants and symmetry types. An implementation is available in the open-source \texttt{psvr} R package~\cite{BenavidesHerrera2026Rpsvr}.

artificial intelligence, constraint, machine learning, (16 more...)

2605.01446

Country:

Europe (0.28)
North America > United States (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.67)

Tang, Wei-Ting, Kudva, Akshay, Tsay, Calvin, Paulson, Joel A.

An Efficient Spatial Branch-and-Bound Algorithm for Global Optimization of Gaussian Process Posterior Mean Functions

arXiv.org Machine LearningMay-5-2026

We study the deterministic global optimization of trained Gaussian process posterior mean functions over hyperrectangular domains. Although the posterior mean function has a compact closed-form representation, its global optimization is challenging because it remains nonlinear and nonconvex. Existing exact deterministic approaches become increasingly difficult to scale as the number of training data points grows, leading to approximation-based methods that improve tractability by optimizing a modified (inexact) objective. In this work, we propose PALM-Mean, a piecewise-analytic lower-bounding framework embedded in reduced-space spatial branch-and-bound. At each node, kernel terms that are locally important are replaced by a sign-aware piecewise-linear relaxation in an appropriate scalar distance variable, while the remaining terms are bounded analytically in closed form. We show this hybrid approach yields a valid lower bound for the posterior mean, while limiting the size of the branch-and-bound subproblems. We establish validity of the node lower bounds and $\varepsilon$-global convergence of the resulting algorithm. Computational results on synthetic benchmarks and real-world application problems show that PALM-Mean improves scalability relative to representative general-purpose deterministic global solvers, particularly as the number of training data points increases.

artificial intelligence, optimization, optimization problem, (18 more...)

2605.00855

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report > New Finding (0.46)

Industry: Materials > Chemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

arXiv.org Machine LearningMay-5-2026

A Theory of Generalization in Deep Learning

Litman, Elon, Guo, Gabe

We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rapidly; in the vast orthogonal dimensions corresponding to noise, the kernel's near-zero eigenvalues trap residual error in a test-invisible reservoir. Within the signal channel, minibatch SGD ensures that coherent population signal accumulates via fast linear drift, while idiosyncratic memorization is suppressed into a slow, diffusive random walk. We prove generalization survives even when the kernel evolves $\mathcal{O}(1)$ in operator norm, the full feature-learning regime. This theory naturally explains disparate phenomena in deep learning theory, such as benign overfitting, double descent, implicit bias, and grokking. Lastly, we derive an exact population-risk objective from a single training run with no validation data, for any architecture, loss, or optimizer, and prove that it measures precisely the noise in the signal channel. This objective reduces in practice to an SNR preconditioner on top of Adam, adding one state vector at no extra cost; it accelerates grokking by $5 \times$, suppresses memorization in PINNs and implicit neural representations, and improves DPO fine-tuning under noisy preferences while staying $3 \times$ closer to the reference policy.

artificial intelligence, machine learning, operator, (19 more...)

2605.01172

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Neural Information Processing SystemsMay-1-2026, 01:40:00 GMT

04a1bf2d968f1ce381cf1f9184a807a9-Paper.pdf

artificial intelligence, lipschitz constant, machine learning, (15 more...)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Neural Information Processing SystemsApr-24-2026, 18:13:36 GMT

10fb6cfa4c990d2bad5ddef4f70e8ba2-Paper.pdf

artificial intelligence, machine learning, survival analysis, (17 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Neural Information Processing SystemsApr-24-2026, 09:16:17 GMT

0525fa17a8dbea687359116d01732e12-Supplemental-Conference.pdf

artificial intelligence, fourier transform, regularisation, (17 more...)

Technology: Information Technology > Artificial Intelligence (0.31)

Miguel A. Carreira-Perpinan, Ramin Raziperchikolaei

An ensemble diversity approach to supervised binary hashing

Neural Information Processing SystemsMar-23-2026, 11:37:49 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, hash function, machine learning, (18 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsMar-16-2026, 23:01:59 GMT

Representer Point Selection for Explaining Deep Neural Networks

We propose to explain the predictions of a deep neural network, by pointing to the set of what we call representer points in the training set, for a given test point prediction. Specifically, we show that we can decompose the pre-activation prediction of a neural network into a linear combination of activations of training points, with the weights corresponding to what we call representer values, which thus capture the importance of that training point on the learned parameters of the network. But it provides a deeper understanding of the network than simply training point influence: with positive representer values corresponding to excitatory training points, and negative values corresponding to inhibitory points, which as we show provides considerably more insight. Our method is also much more scalable, allowing for real-time feedback in a manner not feasible with influence functions.

artificial intelligence, machine learning, proceedings, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)