AITopics

2605.29669

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)

Wornbard, Jakub, Shen, Zikai, Meunier, Dimitri, Gretton, Arthur

Semiparametrically Efficient Inference for Kernel Measures of Noise Heterogeneity

We develop semiparametrically efficient inference for kernel measures of noise heterogeneity in additive noise models. In many applications, the regression function is estimated using flexible machine learning methods. Downstream procedures based on the resulting residuals can then inherit first-stage bias: regression error may induce spurious dependence between covariates and residuals, invalidating the assumptions needed for standard analysis. We construct a novel Hilbert-valued one-step estimator of the kernel covariance operator between covariates and residuals. Our estimator yields bootstrap-calibrated tests for residual independence and goodness of fit in additive noise models, while also providing asymptotically efficient confidence intervals for the kernel dependence measure under noise heterogeneity. The framework extends to settings with additional covariates, enabling inference on distributional heterogeneity of residual noise across treatment groups. Simulations show improved calibration and power relative to naive plug-in residual methods.

artificial intelligence, estimator, machine learning, (17 more...)

2605.27526

Genre: Research Report > Experimental Study (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Wang, Xiaomeng, Bastani, Hamsa, Bastani, Osbert, Ren, Zhimei

Learning to target with network interference

This paper studies adaptive targeting under network interference in a bandit setting, where treatments applied to one individual may affect others through spillover effects. We consider a linear model in a sparse regime, where each individual's outcome can be affected by at most a few others. We first establish a regret lower bound showing that ignoring the network structure and reducing the problem to a standard linear bandit inevitably leads to inefficient learning, particularly in large populations. To understand how structural information can be leveraged, we analyze regimes with varying levels of knowledge of the interference structure: (1) full support knowledge, (2) knowledge of the column support sizes, and (3) no prior knowledge. For each regime, we establish regret lower bounds characterizing the fundamental limits of learning, and develop algorithms that achieve near-optimal regret. Together, our results provide a unified view of how knowledge of the interference structure governs the efficiency of online learning under interference, and offer practical adaptive targeting algorithms in each setting. Numerical experiments on synthetic and real-world data demonstrate the practical benefits of our algorithms.

artificial intelligence, data mining, machine learning, (18 more...)

2605.27794

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Convergence of empirical subgradients for optimal transport-based objectives

Le, Tam

Optimal transport is widely used to learn distributions, enforce distributional constraints, and model uncertainty. In applications, transport losses are often computed from samples through tractable representations, such as one-dimensional sorting formulas or sliced Wasserstein costs, making them practical components in training pipelines. We study parameterized objectives defined by sampled transport costs and prove graphical convergence of their subdifferentials to the subdifferential of the population objective. In particular, this ensures that standard subgradient methods consistently approach stationary points of the population-level problem. We illustrate the results in several settings, including risk-averse optimization, fairness-constrained learning, and sliced Wasserstein problems. Our analysis highlights that smooth parameterizations provide a favorable interface between statistical consistency and optimization. By contrast, transport objectives with nonsmooth costs and models may exhibit unstable derivatives in the large-sample limit.

artificial intelligence, machine learning, proposition 4, (17 more...)

2605.28134

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Geometry of Relaxed Fair Regression: A Unified Framework for Aware and Unaware Settings

Lince, M. Generali, Divol, V., Flamary, R., Gaucher, S., Loiseau, P.

Fairness-accuracy trade-offs are a central concern in the deployment of fairness-aware machine learning methods. When sensitive attributes are unavailable at inference time-the so called unawareness setting, principled methods for obtaining accurate predictions under relaxed fairness constraints are largely missing. In this work, we address this gap by formulating regression under a demographic parity penalty as an optimal transport problem. Our framework unifies both the \emph{aware} and \emph{unaware} settings and characterizes optimal prediction functions via optimal transport maps, under both squared Wasserstein-2 and Total Variation penalties. These results reveal that the choice of penalty reflects fundamentally different fairness philosophies: the Wasserstein penalty induces a smooth, population-wide compromise, while Total Variation enforces exact parity for a subset of individuals. Building on these theoretical characterizations, we propose an algorithm that is simple to implement, computationally efficient, and consistently matches or outperforms state-of-the-art baselines on real-world benchmarks.

artificial intelligence, dataset, machine learning, (18 more...)

2605.28233

Country: Europe (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Nonparametric Instrumental Variable Analysis Without Structural Equations: Debiased Inference on Functionals of Inverse Problems with No Solutions

Shen, Zikai, Kallus, Nathan, Meunier, Dimitri, Zenati, Houssam, Gretton, Arthur, Bibaut, Aurélien

Instrumental variable (IV) analyses generally start by posing a structural equation: Y = hstructural(X)+ϵ, (1) where hstructural represents the causal effect of X on Y, and X and ϵ may be endogenous (E[ϵ | X] = 0). Then given an exogenous instrument Z satisfying the exclusion restriction, the common statistical solution given joint observations of W = (X,Y,Z) P is to conduct inference on some continuous linear functional h 7 EP[m(W;h)] of a solution h H to the linear equation implied by exclusion: TPh = rP, (2) where TP: H G maps h 7 argming GEP(h(X) g(Z))2, rP = argminr GEP(Y r(Z))2, and H, G are closed linear subspaces of square-integrable functions of X and of Z, respectively. For example, if these are all square-integrable functions, then (TPh)(Z) = EP[h(X) | Z] is the conditional expectation.

artificial intelligence, kernel, machine learning, (19 more...)

2604.2466

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Wang, Shengbo, Blanchet, Jose, Glynn, Peter

Fast Convergence of Policy Regret in Learning Stochastic Optimal Control

Policy learning in modern operations environments faces a fundamental tension between limited operational data and the large, often continuous, state and action spaces over which good decisions must be identified and deployed. We study value-based policy learning in stochastic optimal control: a greedy policy induced by an estimate of the optimal action-value function $Q^*$ is deployed, and its performance is measured by regret. The empirical success of this approach calls for statistical insight into the structures that enable fast regret convergence. We show that, in continuous action spaces, fast policy learning is induced by three geometric structures: a growth exponent $p$, which quantifies how quickly $Q^*$ separates suboptimal actions from its maximizers; a margin-mass exponent $m$, which controls how much deployment mass lies on states with weak growth; and an action-wise regularity exponent $q$, which measures the smoothness of the $Q^*$-estimation error across actions. Given a $n^{-1/2}$-accurate estimator of $Q^*$, we show that the minimax-optimal policy regret convergence rate is \[ \widetildeΘ\left( n^{-\min\left\{\frac{p}{2(p-q)},\frac{m+1}{2m}\right\}} \right), \] up to a logarithmic factor at the boundary between the two regimes. The exponent $q$ is crucial: $q>0$ yields faster-than-$n^{-1/2}$ regret. This regime is natural in operations applications. In particular, we verify $q>0$ under mild regularity conditions in dynamic inventory control and service allocation examples, while the mechanism underlying this fast rate regime extends beyond these settings.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2605.26361

Country: North America > United States (0.67)

Genre: Research Report (0.81)

Industry: Education (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Barakat, Anas, Kontogiannis, Andreas, Pollatos, Vasilis, Panageas, Ioannis, Varvitsiotis, Antonios

Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback

We study adversarial online learning with hidden-convex losses, i.e., nonconvex losses that become convex after a nonlinear reparameterization. Ghai, Lu and Hazan (2022) proved that, under geometric and smoothness assumptions, online gradient descent (OGD) on such nonconvex losses approximately simulates online mirror descent (OMD) on the underlying convex losses with a suitable regularizer, yielding $\mathcal{O}(T^{2/3})$ regret. They left open whether the optimal $Θ(\sqrt{T})$ regret from online convex optimization can be recovered in this hidden-convex setting. We answer this question affirmatively. More specifically, via a sharper discrete-time algorithmic equivalence argument, we prove that OGD achieves $\mathcal{O}(\sqrt{T})$ regret under the same assumptions, matching the optimal worst-case rate for adversarial online convex optimization. We also address another open question of Ghai, Lu and Hazan (2022) by clarifying the geometry required for this algorithmic equivalence. We replace the diagonal-Jacobian sufficient condition with a necessary-and-sufficient Hessian compatibility condition, thereby expanding the class of admissible reparameterizations. We complement our tight regret bound with a lower bound showing that the Hessian compatibility assumption is essential for OGD; when it fails, we construct a smooth reparameterization and an adversarial sequence of hidden-convex losses for which OGD suffers $Ω(T)$ regret. Finally, we extend our analysis to one-point bandit feedback and prove a $\mathcal{O}(T^{3/4})$ expected regret bound for bandit OGD with spherical smoothing, matching its classical rate on convex losses.

artificial intelligence, machine learning, sequence, (16 more...)

2605.26373

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Bilevel Optimization over Saddle Points of Zero-Sum Markov Games

Zheng, Zihao, King, Irwin, Lu, Songtao

Reinforcement learning (RL) often has a hierarchical structure, where an upper-level (UL) learner selects model parameters and a lower-level (LL) decision-making process responds, naturally leading to a bilevel optimization problem. Most existing bilevel RL methods assume a single-policy LL Markov decision process (MDP), and therefore fail to capture competitive structures arising in applications such as incentive design, where multiple policies interact. We study bilevel optimization problems in which the LL problem is a regularized min-max zero-sum Markov game and the UL objective is optimized through the saddle-point equilibrium induced by the LL game. In this work, we propose penalty-augmented Nikaido-Isoda descent-ascent (PANDA), a penalty-based first-order policy-gradient method based on the Nikaido-Isoda function. By exploiting the min-max game structure, PANDA avoids computing UL hypergradients and does not require second-order information. We prove that PANDA converges to stationary points without convexity assumptions on either the UL or LL objectives. Moreover, PANDA reaches an $ε$-stationary point in $\tilde{\mathcal{O}}(ε^{-1})$ iterations with sample complexity $\tilde{\mathcal{O}}(ε^{-3})$, matching the best-known rates for bilevel RL with single-policy LL MDPs. Experiments demonstrate the superior performance of PANDA over closely related baselines.

artificial intelligence, machine learning, optimization problem, (14 more...)

2605.26654

Country: Asia (0.27)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Machine LearningMay-26-2026

Score-Repellent Monte Carlo: Toward Efficient Non-Markovian Sampler with Constant Memory in General State Spaces

Hu, Jie, Chen, Lingyun, Kim, Geeho, Choi, Jinyoung, Han, Bohyung, Eun, Do Young

History-dependent sampling can reduce long-run Monte Carlo variance by discouraging redundant revisits, but existing schemes typically encode history through empirical measure on finite state spaces, which is infeasible in high-dimensional discrete configuration spaces or ill-posed in continuous domains. We propose Score-Repellent Monte Carlo (SRMC) framework that summarizes trajectory history by a running average of score evaluations in $\mathbb{R}^d$, where $d$ is the dimension of the score and state representation. This history is converted into a surrogate target through an exponential score tilt, indexed with $α$ that represents the strength of repellence in controlling the magnitude of the history-based repulsion. The surrogate family is normalization-free in the standard MCMC sense, yielding a generic wrapper: at each iteration, any base kernel targeting $π$ can instead be run on the current surrogate $π_{θ_n}$ while the history is updated online. We analyze the coupled evolution of the history recursion and Monte Carlo estimators using stochastic approximation with controlled Markovian noise, establishing almost sure convergence and a joint central limit theorem. We further identify regimes in which the asymptotic covariance decreases as $α$ increases, with scaling $O(1/α)$, extending the near-zero-variance effect of finite-state history-dependent samplers to general state spaces with constant memory. Experiments on continuous targets and discrete energy-based models demonstrate improved estimator variance and mode coverage, while retaining $O(d)$ memory usage and modest per-iteration overhead.

artificial intelligence, machine learning, sampler, (16 more...)

2604.22948

Country: Asia > South Korea (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)