AITopics

2605.26373

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Lascu, Razvan-Andrei, Suzuki, Taiji

From Saddle Points Toward Global Minima: A Newton-Type Method on Wasserstein Space

arXiv.org Machine LearningMay-19-2026

We study the minimization of non-convex functionals over the Wasserstein space. While recent work has showed that perturbed Wasserstein gradient methods can avoid saddle points for benign landscapes, existing approaches remain essentially first-order and do not provide fast local convergence once the iterates enter a neighborhood of a global minimizer. We propose Wasserstein Saddle-Free Newton (WSFN), a second-order method that preconditions the Wasserstein gradient by a regularized square root of the squared Wasserstein Hessian. This construction preserves attraction toward directions of positive curvature while inducing repulsion along directions of negative curvature, thereby overcoming the tendency of standard Wasserstein Newton dynamics to be attracted to saddles. We also establish second-order sufficient optimality conditions on Wasserstein space for strict local minimality. Under regularity and benign landscape assumptions, we prove that WSFN escapes saddle regions and reaches an $α$-neighborhood of a global minimizer in polynomial time, with improved dependence on saddle parameters compared with prior perturbed first-order methods. Once inside this neighborhood, we show that WSFN converges linearly in $L^2$-Wasserstein distance to a non-degenerate global minimizer. Finally, we present a particle-based implementation of the method.

artificial intelligence, machine learning, minimizer, (17 more...)

2605.17963

Country:

Asia > Japan (0.28)
North America > United States (0.27)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningMay-12-2026

Active Multiple-Prediction-Powered Inference

Brawand, Nicholas, Leclerc, Nima, Ngo, Anhthy, Peterson, Matthew, Vishwanath, Sriram, Alhussein, Laith, Wellner, Ben

Post-deployment monitoring of healthcare AI requires statistically valid, label-efficient methods, but gold-standard labels from clinician chart review are expensive. Prediction-powered inference (PPI) and active statistical inference (ASI) reduce label cost by combining a small labeled sample with abundant model predictions, but both are restricted to a single predictor, a poor fit for modern clinical pipelines that have multiple predictors of differing cost and accuracy available at inference time. We propose Active Multiple-Prediction-Powered Inference (AM-PPI), which routes each instance to a cost-appropriate predictor subset, samples gold-standard labels in proportion to the chosen subset's residual uncertainty, and reweights predictions to minimize estimator variance, all under a single deployment-time budget. AM-PPI generalizes ASI to leverage multiple predictors and extends Multiple-PPI from global per-predictor allocation to per-instance adaptive routing. We derive closed-form Karush-Kuhn-Tucker (KKT) conditions for all three decisions and prove, via biconvexity and strong duality, that the resulting fixed point is a global optimum despite the joint problem being non-jointly-convex. We establish asymptotic normality with valid coverage, minimum-variance unbiasedness within the linear-prediction augmented inverse propensity weighted (AIPW) class, and a closed-form criterion identifying when multiple predictors help. On synthetic data and three healthcare monitoring tasks, AM-PPI produces 10 to 40 percent narrower confidence intervals (CIs) than single-predictor ASI in the budget regime where routing matters, and matches the better baseline elsewhere.

artificial intelligence, machine learning, predictor, (17 more...)

2605.08429

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Health Care Technology (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Helgerth, Lena, Christmann, Andreas

Ratio-based Loss Functions

arXiv.org Machine LearningMay-8-2026

Algorithms in machine learning and AI do critically depend on at least three key components: (i) the risk function, which is the expectation of the loss function, (ii) the function space, which is often called the hypothesis space, and (iii) the set of probability measures, which are allowed for the specified algorithm. This paper gives a survey of a certain class of loss functions, which we call ratio-based. In supervised learning, margin-based loss functions for classification tasks depending on the product of the output values $y_i$ and the predictions $f(x_i)$ as well as distance-based loss functions depending on the difference of $y_i$ and $f(x_i)$ for regression are common. Distance-based loss functions are in particular useful, if an additive model assumption seems plausible, i.e. the common signal plus noise assumption. However, in the literature, several loss functions proposed for regression purposes have a multiplicative error structure in mind and pay attention to relative errors, i.e. to the ratio of $y_i$ and $f(x_i)$. In this survey article, we systematically investigate such ratio-based loss functions and propose a few new losses, which may be interesting for future research. We concentrate on investigating general properties of ratio-based loss functions like continuity, Lipschitz-continuity, convexity, and differentiability, because these properties play a central role in most machine learning algorithms. Therefore, we do not focus on some specific machine learning algorithm to derive universal consistency, learning rates, or stability results. Instead, we want to enable future research in this direction.

artificial intelligence, machine learning, survey article, (19 more...)

2605.05808

Country: Europe (0.28)

Genre: Overview (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

arXiv.org Machine LearningMay-6-2026

SOC-ICNN: From Polyhedral to Conic Geometry for Learning Convex Surrogate Functions

Liu, Kang, Hu, Jianchen, Peng, Wei

Classical ReLU-based Input Convex Neural Networks (ICNNs) are equivalent to the optimal value functions of Linear Programming (LP). This intrinsic structural equivalence restricts their representational capacity to piecewise-linear polyhedral functions. To overcome this representational bottleneck, we propose the SOC-ICNN, an architecture that generalizes the underlying optimization class from LP to Second-Order Cone Programming (SOCP). By explicitly injecting positive semi-definite curvature and Euclidean norm-based conic primitives, our formulation introduces native smooth curvature into the representation while preserving a rigorous optimization-theoretic interpretation. We formally prove that SOC-ICNNs strictly expand the representational space of ReLU-ICNNs without increasing the asymptotic order of forward-pass complexity. Extensive experiments demonstrate that SOC-ICNN substantially improves function approximation, while delivering competitive downstream decision quality. The code is available at https://anonymous.4open.science/r/SOC-ICNN-4B18/.

artificial intelligence, machine learning, optimization problem, (20 more...)

2604.22355

Country: North America > United States (0.28)

Genre: Research Report (0.63)

Industry: Energy > Power Industry (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Neural Information Processing SystemsApr-30-2026, 19:37:59 GMT

00e26af6ac3b1c1c49d7c3d79c60d000-AuthorFeedback.pdf

artificial intelligence, eg wipeout ablation, machine learning, (14 more...)

Genre: Research Report > New Finding (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Neural Information Processing SystemsApr-29-2026, 00:32:55 GMT

Strategic Distribution Shift of Interacting Agents via Coupled Gradient Flows

We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems that captures the feedback loop between learning algorithms and the distributions on which they are deployed.

algorithm, artificial intelligence, machine learning, (18 more...)

Country: North America > United States (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.40)

Neural Information Processing SystemsApr-25-2026, 19:58:36 GMT

DP_Stochastic_Optimization__New_Results_in_Convex_and_Non_convex_Settings-1.pdf

artificial intelligence, log 2, machine learning, (17 more...)

Genre: Research Report > New Finding (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

In addition to the work on noisy convex optimization, the current paper is also thematically related to works in learning theory and complexity where the goal is to reconstruct simple classes of functions under outlier noise. This includes work on reconstruction of low-degree polynomials [4, 14, 15]. In particular, [15] gave an efficient algorithm whose error tolerance matches the information theoretic limits. In addition, recently, [9] achieved similar algorithmic guarantees for functions which are sparse in the Fourier space. While similar in spirit, the model in these works differ from the current paper in one crucial way - namely, while we only put a bound on the volume of the outlier locations, they, in addition, assume that the outlier locations are also uniformly distributed in the domain. At a more technical level, the results in [4, 14, 15, 9] crucially rely on techniques originating from coding theory such as the Goldreich-Levin theorem [13] and the Berlekamp-Welch algorithm [6].

artificial intelligence, machine learning, probability, (19 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)