AITopics | minimization

Collaborating Authors

minimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Global Minimizers of ℓp-Regularized Objectives Yield the Sparsest ReLU Neural Networks

Neural Information Processing SystemsJun-19-2026, 05:22:26 GMT

Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of finding the sparsest interpolating ReLU network--i.e., the network with the fewest nonzero parameters or neurons--a goal with wide-ranging implications for efficiency, generalization, interpretability, theory, and model compression. Unlike post hoc pruning approaches, we propose a continuous, almost-everywhere differentiable training objective whose global minima are guaranteed to correspond to the sparsest singlehidden-layer ReLU networks that fit the data. This result marks a conceptual advance: it recasts the combinatorial problem of sparse interpolation as a smooth optimization task, potentially enabling the use of gradient-based training methods. Our objective is based on minimizing ℓp quasinorms of the weights for 0 < p < 1, a classical sparsity-promoting strategy in finite-dimensional settings. However, applying these ideas to neural networks presents new challenges: the function class is infinite-dimensional, and the weights are learned using a highly nonconvex objective. We prove that, under our formulation, global minimizers correspond exactly to sparsest solutions. Our work lays a foundation for understanding when and how continuous sparsity-inducing objectives can be leveraged to recover sparse networks through training.

artificial intelligence, machine learning, sout, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise

Neural Information Processing SystemsJun-17-2026, 19:31:21 GMT

Sharpness-aware minimization (SAM) has emerged as a highly effective technique to improve model generalization, but its underlying principles are not fully understood. We investigate m-sharpness, where SAM performance improves monotonically as the micro-batch size for computing perturbations decreases, a phenomenon critical for distributed training yet lacking rigorous explanation. We leverage an extended Stochastic Differential Equation (SDE) framework and analyze stochastic gradient noise (SGN) to characterize the dynamics of SAM variants, including n-SAM and m-SAM. Our analysis reveals that stochastic perturbations induce an implicit variance-based sharpness regularization whose strength increases as m decreases. Motivated by this insight, we propose Reweighted SAM (RW-SAM), which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable.

Add feedback

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Neural Information Processing SystemsJun-17-2026, 05:42:37 GMT

The recently proposed optimization algorithm for deep neural networks Sharpness Aware Minimization (SAM) suggests perturbing parameters before gradient calculation by a gradient ascent step to guide the optimization into parameter space regions of flat loss. While significant generalization improvements and thus reduction of overfitting could be demonstrated, the computational costs are doubled due to the additionally needed gradient calculation, making SAM unfeasible in case of limited computationally capacities. Motivated by Nesterov Accelerated Gradient (NAG) we propose Momentum-SAM (MSAM), which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands over SGD or Adam. We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization.

artificial intelligence, machine learning, msam, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

29f1595d10229ee89dcc8abbbd9c5d87-Paper-Conference.pdf

Neural Information Processing SystemsJun-15-2026, 20:19:55 GMT

Adversarial learning is a widely used paradigm in machine learning, often formulated as a min-max optimization problem where the inner maximization imposes adversarial constraints to guide the outer learner toward more robust solutions.

artificial intelligence, international conference, machine learning, (13 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

AComputationally Viable Numerical Gradient-based Technique for Optimal Covering Problems

Neural Information Processing SystemsJun-15-2026, 17:30:52 GMT

The problem of optimally covering a given compact subset of RN with a preassigned number n of Euclidean metric balls has a long-standing history and it is well-recognized to be computationally hard. This article establishes a numerically viable algorithm for obtaining optimal covers of compact sets via two key contributions. The first is a foundational result establishing Lipschitz continuity of the marginal function of a certain parametric non-convex maximization problem in the optimal covering problem, and it provides the substrate for numerical gradient algorithms to be employed in this context. The second is an adaptation of a stochastically smoothed numerical gradient-based (zeroth-order) algorithm for a non-convex minimization problem, that, equipped with randomized restarts, spurs global convergence to an optimal cover. Several numerical experiments with complicated nonconvex compact sets demonstrate the excellent performance of our techniques.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

FedWMSAM: Fast and Flat Federated Learning via Weighted Momentum and Sharpness-Aware Minimization

Neural Information Processing SystemsJun-14-2026, 11:30:54 GMT

These twin requirements have naturally led to two widely used techniques: client/server momentum to accelerate progress, and sharpness-aware minimization (SAM) to prefer flat solutions. However, simply combining momentum and SAM leaves two structural issues unresolved in non-IIDFL. We identify and formalize two failure modes: local-global curvature misalignment (local SAM directions need not reflect the global loss geometry) and momentum-echo oscillation (late-stage instability caused by accumulated momentum). To our knowledge, these failure modes have not been jointly articulated and addressed in the FL literature. We propose FedWMSAM to address both failure modes. First, we construct a momentum-guided global perturbation from server-aggregated momentum to align clients' SAM directions with the global descent geometry, enabling a singlebackprop SAM approximation that preserves efficiency.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.46)
North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology > Security & Privacy (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Alternating Gradient Flows: ATheory of Feature Learning in Two-layer Neural Networks

Neural Information Processing SystemsJun-14-2026, 11:23:15 GMT

What features neural networks learn, and how, remains an open question. In this paper, we introduce Alternating Gradient Flows (AGF), an algorithmic framework that describes the dynamics of feature learning in two-layer networks trained from small initialization. Prior works have shown that gradient flow in this regime exhibits a staircase-like loss curve, alternating between plateaus where neurons slowly align to useful directions and sharp drops where neurons rapidly grow in norm. AGF approximates this behavior as an alternating two-step process: maximizing a utility function over dormant neurons and minimizing a cost function over active ones. AGF begins with all neurons dormant.

artificial intelligence, machine learning, neuron, (14 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Unifying Analysis of Projected Gradient Descent for $\ell_p$-constrained Least Squares

Bahmani, Sohail, Raj, Bhiksha

arXiv.org Machine LearningJun-3-2026

In this paper we study the performance of the Projected Gradient Descent(PGD) algorithm for $\ell_{p}$-constrained least squares problems that arise in the framework of Compressed Sensing. Relying on the Restricted Isometry Property, we provide convergence guarantees for this algorithm for the entire range of $0\leq p\leq1$, that include and generalize the existing results for the Iterative Hard Thresholding algorithm and provide a new accuracy guarantee for the Iterative Soft Thresholding algorithm as special cases. Our results suggest that in this group of algorithms, as $p$ increases from zero to one, conditions required to guarantee accuracy become stricter and robustness to noise deteriorates.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1016/j.acha.2012.07.004

1107.4623

Country: North America > United States (0.68)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

To discretize continually: Mean shift interacting particle systems for Bayesian inference

Belhadji, Ayoub, Sharp, Daniel, Marzouk, Youssef M.

arXiv.org Machine LearningMay-15-2026

Integration against a probability distribution given its unnormalized density is a central task in Bayesian inference and other fields. We introduce new methods for approximating such expectations with a small set of weighted samples -- i.e., a quadrature rule -- constructed via an interacting particle system that minimizes maximum mean discrepancy (MMD) to the target distribution. These methods extend the classical mean shift algorithm, as well as recent algorithms for optimal quantization of empirical distributions, to the case of continuous distributions. Crucially, our approach creates dynamics for MMD minimization that are invariant to the unknown normalizing constant; they also admit both gradient-free and gradient-informed implementations. The resulting mean shift interacting particle systems converge quickly, capture anisotropy and multi-modality, avoid mode collapse, and scale to high dimensions. We demonstrate their performance on a wide range of benchmark sampling problems, including multi-modal mixtures, Bayesian hierarchical models, PDE-constrained inverse problems, and beyond.

evolutionary algorithm, machine learning, quadrature rule, (14 more...)

arXiv.org Machine Learning

2605.14142

Country: North America > United States (0.68)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)

Add feedback

Active Multiple-Prediction-Powered Inference

Brawand, Nicholas, Leclerc, Nima, Ngo, Anhthy, Peterson, Matthew, Vishwanath, Sriram, Alhussein, Laith, Wellner, Ben

arXiv.org Machine LearningMay-12-2026

Post-deployment monitoring of healthcare AI requires statistically valid, label-efficient methods, but gold-standard labels from clinician chart review are expensive. Prediction-powered inference (PPI) and active statistical inference (ASI) reduce label cost by combining a small labeled sample with abundant model predictions, but both are restricted to a single predictor, a poor fit for modern clinical pipelines that have multiple predictors of differing cost and accuracy available at inference time. We propose Active Multiple-Prediction-Powered Inference (AM-PPI), which routes each instance to a cost-appropriate predictor subset, samples gold-standard labels in proportion to the chosen subset's residual uncertainty, and reweights predictions to minimize estimator variance, all under a single deployment-time budget. AM-PPI generalizes ASI to leverage multiple predictors and extends Multiple-PPI from global per-predictor allocation to per-instance adaptive routing. We derive closed-form Karush-Kuhn-Tucker (KKT) conditions for all three decisions and prove, via biconvexity and strong duality, that the resulting fixed point is a global optimum despite the joint problem being non-jointly-convex. We establish asymptotic normality with valid coverage, minimum-variance unbiasedness within the linear-prediction augmented inverse propensity weighted (AIPW) class, and a closed-form criterion identifying when multiple predictors help. On synthetic data and three healthcare monitoring tasks, AM-PPI produces 10 to 40 percent narrower confidence intervals (CIs) than single-predictor ASI in the budget regime where routing matters, and matches the better baseline elsewhere.

artificial intelligence, machine learning, predictor, (17 more...)

arXiv.org Machine Learning

2605.08429

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Industry: