AITopics

2606.29593

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

arXiv.org Machine LearningJun-30-2026

On Local Population-Risk Certificates

Song, Mingzhi

We develop finite-sample certificates for local population-risk increments $Pδ_v=R(θ_0+v)-R(θ_0)$, $v\in\mathcal D$. The primitive object is an expected-valid upper endpoint $\widehat{\mathsf U}_{\mathcal D}$ satisfying $\mathbb E\sup_{v\in\mathcal D} \{Pδ_v-\widehat{\mathsf U}_{\mathcal D}(v)\}\le0$. This uniform criterion certifies any measurable update selected from the same sample and allows penalties to depend on empirical geometry. The main construction is a cross-fitted ridge calibration for linear feature classes. A pilot fold learns the ridge metric, the complementary fold calibrates the squared mean error in that metric, and complete split averaging recovers the full empirical covariance in the directional quadratic form $\widehat q_{X,λ}$. The optimized diagnostic scale is $\{\widehat q_{X,λ}(h) \widehat r_{X,n_{\rm p},λ}^{\rm cf}/n\}^{1/2}$, and the calibrated trace factor $\widehat r_{X,n_{\rm p},λ}^{\rm cf}$ is compared with the ordinary ridge effective dimension $\widehat r_{X,λ}$. For nonsmooth losses, an exact fixed-mask decomposition $δ_v=J_v^0+R_v^\circ+C_v$ separates frozen Taylor fluctuations, good-path remainders, and interface crossings. Applying the linear and composite certificates componentwise yields endpoints for same-sample expected local search and concentrated release rules.

artificial intelligence, certificate, machine learning, (18 more...)

2606.19147

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

arXiv.org Machine LearningJun-30-2026

Variance Reduction for Stochastic Gradient Generalized Non-reversible Langevin Monte Carlo Algorithms

Ni, Bingye, Wang, Xiaoyu, Wang, Yingli, Zhu, Lingjiong

We study the leading-order fluctuation of stochastic gradient Euler-Maruyama estimators for generalized non-reversible Langevin dynamics. Under structural assumptions tailored to the small-stepsize central limit theorem and under an unbiased stochastic gradient oracle, we prove that the empirical average over a horizon of order the inverse squared stepsize satisfies a central limit theorem in the vanishing-stepsize regime. The limiting variance is characterized through the Poisson equation of the limiting full-gradient diffusion. We then rewrite this constant in an operator form that links it to the continuous-time asymptotic variance and, under standard operator-theoretic assumptions, derive a sufficient condition under which an anti-symmetric perturbation strictly reduces the leading-order fluctuation constant relative to the reversible baseline. We also identify bounded smooth predictive observables that re directly covered by the main theorem. As a separate Gaussian calculation beyond the bounded-test-function regime, we obtain closed-form formulas for quadratic Hamiltonians and linear observables. The framework covers non-reversible Langevin dynamics and augmented-state examples including Hessian-free high-resolution dynamics and a positive-definite subclass of gradient-adjusted underdamped Langevin dynamics that allow stochastic gradients. Numerical experiments on basic examples and Bayesian linear regression using synthetic data, and Bayesian logistic regression using real data support the predicted Gaussian fluctuations and show that the non-reversible schemes consistently reduce the root mean squared error (RMSE) relative to their reversible baselines.

artificial intelligence, assumption 2, machine learning, (16 more...)

2606.28808

Country:

Asia > China (0.46)
North America > United States (0.45)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Lee, Wei-Cheng, Orabona, Francesco

A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak--Ruppert Averaging

arXiv.org Machine LearningJun-25-2026

We study linear TD(0) under Markovian sampling, where data are generated along a single trajectory. We provide high-probability guarantees for a plain unprojected TD(0) algorithm with Polyak-Ruppert (PR) averaging, using a single stepsize schedule $η_t \propto \frac{1}{τ_{\mathrm{mix}}\log(t)\sqrt{t}}$ that depends on the mixing time but requires no prior knowledge of the curvature parameter $ω$. Our first result shows that such a choice of the stepsize guarantees that the TD(0) iterates are automatically and uniformly bounded with high probability, without projections and without any stability argument based on $ω$. Building on this result, we establish a simultaneous high-probability convergence guarantee for the PR average: the same stepsize yields both a robust curvature-free $\widetilde{\mathcal{O}}\!\left(\frac{τ_{\mathrm{mix}}}{\sqrt{T}}\right)$ rate and a fast curvature-dependent $\widetilde{\mathcal{O}}\!\left(\frac{τ_{\mathrm{mix}}^2}{ωT}\right)$rate, with the bound taking the minimum of the two. The core technical ingredient is a Poisson-equation toolkit for geometrically mixing Markov chains, which decomposes Markov noise into a martingale term plus a controlled remainder and enables a new self-bounding inductive argument for pathwise stability.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2606.24981

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Odgers, James, Riegler, Ben, Swaroop, Siddharth, Fortuin, Vincent

Gaussian Mean Field Variational Inference can Overestimate Predictive Variance

arXiv.org Machine LearningJun-25-2026

Mean Field Variational Inference (MFVI) is widely understood to underestimate posterior variance. By analysing conjugate Bayesian Linear Regression (BLR), we show that this characterization is incomplete: while MFVI underestimates the variance in parameter space, it can overestimate the predictive variance compared to the exact posterior. We show that if the MFVI posterior underestimates predictive variances in some directions, it necessarily overestimates them in others. Crucially, this overestimation occurs in directions where the training data concentrates. This leads to the surprising result that, for a test point drawn from the training distribution, MFVI's expected predictive variance exceeds that of the exact posterior. We demonstrate a pathological case of this effect, where the MFVI posterior fails to reduce predictive variance compared to the prior on in distribution data. We connect these results to the Cold Posterior Effect, arguing that varying the temperature can correct this overestimation, yielding predictions closer to those of the exact posterior. We validate our theory on synthetic and real-world regression tasks.

artificial intelligence, machine learning, posterior, (18 more...)

2606.25745

Country:

Asia (0.28)
Europe > Germany (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Neural Information Processing SystemsJun-23-2026, 12:22:47 GMT

Strategic Hypothesis Testing

We examine hypothesis testing within a principal-agent framework, where a strategic agent, holding private beliefs about the effectiveness of a product, submits data to a principal who decides on approval. The principal employs a hypothesis testing rule, aiming to pick a p-value threshold that balances false positives and false negatives while anticipating the agent's incentive to maximize expected profitability. Building on prior work, we develop a game-theoretic model that captures how the agent's participation and reporting behavior respond to the principal's statistical decision rule. Despite the complexity of the interaction, we show that the principal's errors exhibit clear monotonic behavior when segmented by an efficiently computable critical p-value threshold, leading to an interpretable characterization of their optimal p-value threshold.

artificial intelligence, machine learning, scientific discovery, (20 more...)

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.81)

Neural Information Processing SystemsJun-23-2026, 00:37:23 GMT

APrivate Approximation of the 2nd-Moment Matrix of Any Subsamplable Input

We study the problem of differentially private second moment estimation and present a new algorithm that achieve strong privacy-utility trade-offs even for worst-case inputs under subsamplability assumptions on the data. We call an input (m,α,β)-subsamplable if a random subsample of size m(or larger) preserves w.p 1 β the spectral structure of the original second moment matrix up to a multiplicative factor of 1 α. Building upon subsamplability, we give a recursive algorithmic framework similar to Kamath et al. (2019) that abides zero-Concentrated Differential Privacy (zCDP) while preserving w.h.p the accuracy of the second moment estimation upto an arbitrary factor of (1 γ). We then show how to apply our algorithm to approximate the second moment matrix of a distribution D, even when a noticeable fraction of the input are outliers.

algorithm, artificial intelligence, machine learning, (18 more...)

Country: North America > United States (0.46)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Security & Privacy (0.93)

Myleiko, Hanna, Solodky, Sergei, Semenov, Vasyl

Convergence Analysis of Nyström Subsampling in Covariate Shift Adaptation for Misspecified case

arXiv.org Machine LearningJun-23-2026

This paper investigates convergence properties of regularized Nystr om subsampling applied to the unsupervised domain adaptation problem under covariate shift. We focus on the low-smoothness (misspecified) case where the target function lies outside the reproducing kernel Hilbert space. By combining Tikhonov regularization with Nystr om projection onto a subsampled subspace, we obtain upper bounds on the excess risk that hold with high probability and are expressed in terms of the source condition, the effective dimension, and the sample sizes. We further extend the analysis to the setting where the Radon-Nikodym derivative between the target and source marginal distributions is unknown and must be approximated, and we identify the minimal additional sample sizes required to maintain the same convergence rate as in the oracle case.

artificial intelligence, machine learning, tjt, (16 more...)

2606.22259

Country: Europe > Ukraine (0.28)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsJun-22-2026, 22:13:41 GMT

Robust Regression of General ReLUs with Queries

We study the task of agnostically learning general (as opposed to homogeneous) ReLUs under the Gaussian distribution with respect to the squared loss. In the passive learning setting, recent work gave a computationally efficient algorithm that uses poly(d,1/ϵ)labeled examples and outputs a hypothesis with error O(opt)+ϵ, where optis the squared loss of the best fit ReLU. Here we focus on the interactive setting, where the learner has some form of query access to the labels of unlabeled examples. Our main result is the first computationally efficient learner that uses dpolylog(1/ϵ)+ O(min{1/p,1/ϵ})black-box label queries, where pis the bias of the target function, and achieves error O(opt)+ϵ. We complement our algorithmic result by showing that its query complexity bound is qualitatively near-optimal, even ignoring computational constraints. Finally, we establish that query access is essentially necessary to improve on the label complexity of passive learning. Specifically, for pool-based active learning, any active learner requires Ω(d/ϵ) labels, unless it draws a super-polynomial number of unlabeled examples.

artificial intelligence, machine learning, polylog, (17 more...)

Country: North America > United States (0.45)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)

Neural Information Processing SystemsJun-22-2026, 20:18:12 GMT

Nearly-Linear Time and Massively Parallel Algorithms for k-Anonymity

Previous algorithms with provable guarantees either (1) achieve the same O(k)approximation ratio but require at least O(n2k) runtime, or (2) provide a better O(logk) approximation ratio at the cost of an impractical O(n2k) worst-case runtime for general d and k. Our algorithm extends to the Massively Parallel Computation (MPC) model, where it gives an MPC algorithm requiring eO(log1+ε n) rounds and total space O(n1+γ(d+k)). Empirically, we also demonstrate that our algorithmic ideas can be adapted to existing heuristic methods, leading to significant speed-ups while preserving comparable performance. On the hardness side, we study the related single-point k-anonymity problem, where the goal is to select k 1 additional records to make a given record indistinguishable. Assuming the dense vs random conjecture in complexity theory, we show that for n = kc, no algorithm can achieve a k1 O(1/c) approximation in poly(n) time, providing evidence for the inherent hardness of the k-anonymity problem.

data mining, machine learning, natural language, (21 more...)

Country:

Asia (0.93)
North America > United States (0.68)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(2 more...)