AITopics

2605.29748

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.45)

arXiv.org Machine LearningMay-28-2026

Smoothed Score Queries and the Complexity of Sampling

Liu, Jingbo

We study the query complexity of sampling from high-dimensional Gaussian distributions using gradient information. In the standard oracle model, exact gradients expose only matrix-vector products with the precision matrix, leading to polynomial approximation barriers and a characteristic $\sqrtκ$ dependence on the condition number. We show that this barrier disappears when the sampler is allowed to query \emph{smoothed scores}, namely gradients of the logarithms of the Gaussian-convolved densities. For a Gaussian target with precision matrix $Λ$, a smoothed-score query at noise level $τ$ gives access to the resolvent $(Λ+τ^{-1}I)^{-1}$. Combining geometrically spaced noise levels with sinc-quadrature rational approximation, we obtain a sampler with $q=O\!\left(\bigl(\logκ+\log(e\sqrt d/δ_{\rm TV})\bigr)\log(e\sqrt d/δ_{\rm TV})\right)$ smoothed-score queries for total variation error $δ_{\rm TV}$, improving the condition-number dependence from $\sqrtκ$ to logarithmic. We also study finite-bit gradient oracles. Using coordinatewise quantization of the transformed smoothed-score answers and a final dithering step, we obtain a sampling scheme whose total communicated gradient information is polylogarithmic in $κ$; in particular, for fixed dimension and accuracy, the bit complexity is $O(\log^2κ)$. To complement these upper bounds, we introduce a channel-synthesis, or reverse-Shannon, converse technique for sampling lower bounds. This converts total-variation simulation guarantees into communication requirements and yields an $Ω(\logκ)$ lower bound on the required gradient information. Together, these results identify smoothed scores as a provably more informative oracle for sampling and give nearly matching upper and lower bounds for its finite-bit complexity.

approximation, artificial intelligence, machine learning, (19 more...)

2605.27769

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Richtárik, Peter, Gruntkowska, Kaja, Li, Hanmin

Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle

arXiv.org Machine LearningMay-12-2026

We design Local LMO - a new projection-free gradient-type method for constrained optimization. The key algorithmic idea is to replace the global linear minimization oracle over the constraint set used by Frank-Wolfe (FW) with a local linear minimization oracle over the intersection of the constraint set and a "small" ball centered at the current iterate. In particular, when minimizing $f:\mathbb{R}^d\to \mathbb{R}$ over a constraint $\emptyset\neq\mathcal{X}\subseteq\mathbb{R}^d$, Local LMO performs the iteration \[x_{k+1}\in \arg\min_{z\in\mathcal{X}\cap\mathcal{B}(x_{k},t_k)}\langle\nabla f(x_{k}), z \rangle,\] where $x_0\in\mathcal{X}$, and $t_k>0$ is a suitably chosen radius which can be interpreted as an effective stepsize. While designed as an alternative to FW, Local LMO is perhaps best viewed as a generalization of Gradient Descent (GD) rather than a modification of FW. Indeed, it is easy to see that Local LMO reduces to GD in the unconstrained setting and, more generally, to GD restricted to an affine subspace if the constraint $\mathcal{X}$ is affine. We prove that this simple algorithmic scheme transfers the known (unaccelerated) convergence rates of Projected Gradient Descent (PGD) to the projection-free world in several important regimes, some of which are beyond the reach of FW. In contrast to FW theory, i) our guarantees hold without requiring the feasible set $\mathcal{X}$ to be bounded, ii) our theory does not require the "curvature" assumption, which allows us to establish a standard sublinear rate for convex functions with bounded gradients, iii) we obtain a linear rate in the smooth strongly convex regime. Furthermore, we obtain sharp sublinear rates in the smooth convex and non-convex regimes, in the $(L_0,L_1)$-smooth convex regime, and in stochastic and non-differentiable settings.

constraint, machine learning, natural language, (20 more...)

2605.0885

Country:

North America > United States (0.67)
Asia (0.45)
Europe (0.45)

Genre:

Research Report (1.00)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Natural Language (0.92)

arXiv.org Machine LearningMay-5-2026

The Causal Description Gap: Information-Theoretic Separations Across Pearl's Hierarchy

Emadi, Seyed Morteza

Pearl's causal hierarchy shows that observational, interventional, and counterfactual queries are qualitatively distinct. We ask a quantitative version of this question: how many additional bits are needed to specify higher-rung causal answers once lower-rung answers are known? We formalize this via query-class description length, the Kolmogorov complexity of the answer oracle induced by an SCM for a class of queries. Our main construction gives binary acyclic SCMs whose observational distribution has constant description length, while the single-variable interventional answer oracle has description length $Θ(n^2)$. A degree-sensitive upper bound shows that finite-gate-schema SCMs of indegree $d$ have observational-interventional gap at most $O(nd \log(en/d) + n \log n)$, making the quadratic construction order-optimal in the dense regime and a rooted-tree construction order-optimal for bounded indegree. The quadratic separation persists under $\varepsilon$-accurate total-variation descriptions for every fixed $\varepsilon < 1/4$. At the next rung, the full hard-do interventional oracle can still leave a $Θ(n)$ counterfactual description gap. A general ambiguity-to-bits theorem and Shannon analogue show that these gaps equal the logarithm of residual higher-rung ambiguity up to lower-order terms.

artificial intelligence, int1, machine learning, (17 more...)

2605.02177

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.35)

BBC NewsMay-1-2026, 21:09:52 GMT

Pentagon says US military to be an 'AI-first' fighting force

Pentagon says US military to be an'AI-first' fighting force The US military plans to increase its use of artificial intelligence (AI) further after the Pentagon agreed to new and expanded contracts with some of the biggest names in technology. Under eight agreements with Google, OpenAI, Amazon, Microsoft, SpaceX, Oracle, Nvidia and the start-up Reflection, the Pentagon said AI technology would now be used for any lawful operational use. These agreements accelerate the transformation [of] the US military as an AI-first fighting force, the Pentagon said. Conspicuous by its absence is Anthropic, as the company has said it is concerned about how the Pentagon could use its tools in warfare and domestically. The firm is now suing the government over the alleged retaliation it faced after refusing to accept any lawful use language in its own contract.

large language model, machine learning, natural language, (19 more...)

BBC News

Country: North America > United States (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Hassan Ashtiani, Shrinu Kushagra, Shai Ben-David

Clustering with Same-Cluster Queries

Neural Information Processing SystemsMay-1-2026, 05:45:09 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, query, (18 more...)

Country: North America > Canada (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.98)

Neural Information Processing SystemsMay-1-2026, 02:25:12 GMT

ABiased Graph Neural Network Sampler with Near-Optimal Regret

Graph neural networks (GNN) have recently emerged as a vehicle for applying deep network architectures to graph and relational data. However, given the increasing size of industrial datasets, in many practical situations the message passing computations required for sharing information across GNN layers are no longer scalable. Although various sampling methods have been introduced to approximate full-graph training within a tractable budget, there remain unresolved complications such as high variances and limited theoretical guarantees. To address these issues, we build upon existing work and treat GNN neighbor sampling as a multi-armed bandit problem but with a newly-designed reward function that introduces some degree of bias designed to reduce variance and avoid unstable, possibly-unbounded pay outs. And unlike prior bandit-GNN use cases, the resulting policy leads to near-optimal regret while accounting for the GNN training dynamics introduced by SGD.

artificial intelligence, data mining, machine learning, (16 more...)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsMay-1-2026, 02:04:56 GMT

Conformal Prediction using Conditional Histograms

This paper develops a conformal method to compute prediction intervals for nonparametric regression that can automatically adapt to skewed data. Leveraging black-box machine learning algorithms to estimate the conditional distribution of the outcome using histograms, it translates their output into the shortest prediction intervals with approximate conditional coverage. The resulting prediction intervals provably have marginal coverage in finite samples, while asymptotically achieving conditional coverage and optimal length if the black-box model is consistent. Numerical experiments with simulated and real data demonstrate improved performance compared to state-of-the-art alternatives, including conformalized quantile regression and other distributional conformal prediction approaches.

artificial intelligence, machine learning, prediction interval, (18 more...)

Country: North America > United States > California (0.14)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Neural Information Processing SystemsApr-30-2026, 20:10:31 GMT

Improved Algorithms for Online Submodular Maximization via First-order Regret Bounds

We consider the problem of nonnegative submodular maximization in the online setting. At time step t, an algorithm selects a set St C 2V where C is a feasible family of sets. An adversary then reveals a submodular function ft. The goal is to design an efficient algorithm for minimizing the expected approximate regret. In this work, we give a general approach for improving regret bounds in online submodular maximization by exploiting "first-order" regret bounds for online linear optimization. For monotone submodular maximization subject to a matroid, we give an efficient algorithm which achieves a (1 c/e ε)-regret of O( p kTln(n/k)) where n is the size of the ground set, k is the rank of the matroid, ε > 0 is a constant, and cis the average curvature. Even without assuming any curvature (i.e., taking c = 1), this regret bound improves on previous results of Streeter et al. (2009) and Golovin et al. (2014). For nonmonotone, unconstrained submodular functions, we give an algorithm with 1/2-regret O( nT), improving on the results of Roughgarden and Wang (2018). Our approach is based on Blackwell approachability; in particular, we give a novel first-order regret bound for the Blackwell instances that arise in this setting.

algorithm, artificial intelligence, machine learning, (15 more...)