AITopics | pl inequality

Collaborating Authors

pl inequality

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Continuous-time Riemannian SGD and SVRGFlows on Wasserstein Probabilistic Space

Neural Information Processing SystemsJun-17-2026, 09:23:04 GMT

Recently, optimization on the Riemannian manifold have provided valuable insights to the optimization community. In this regard, extending these methods to to the Wasserstein space is of particular interest, since optimization on Wasserstein space is closely connected to practical sampling processes. Generally, the standard (continuous) optimization method on Wasserstein space is Riemannian gradient flow (i.e., Langevin dynamics when minimizing KL divergence). In this paper, we aim to enrich the family of continuous optimization methods in the Wasserstein space, by extending the gradient flow on it into the stochastic gradient descent (SGD) flow and stochastic variance reduction gradient (SVRG) flow. By leveraging the property of Wasserstein space, we construct stochastic differential equations (SDEs) to approximate the corresponding discrete Euclidean dynamics of the desired Riemannian stochastic methods. Then, we obtain the flows in Wasserstein space by Fokker-Planck equation. Finally, we establish convergence rates of the proposed stochastic flows, which align with those known in the Euclidean setting.

artificial intelligence, inequality, machine learning, (14 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Proxy Convexity: AUnified Framework for the Analysis of Neural Networks Trained by Gradient Descent

Neural Information Processing SystemsApr-25-2026, 15:12:49 GMT

Although the optimization objectives for learning neural networks are highly nonconvex, gradient-based methods have been wildly successful at learning neural networks in practice. This juxtaposition has led to a number of recent studies on provable guarantees for neural networks trained by gradient descent. Unfortunately, the techniques in these works are often highly specific to the particular setup in each problem, making it difficult to generalize across different settings. To address this drawback in the literature, we propose a unified non-convex optimization framework for the analysis of neural network training. We introduce the notions of proxy convexity and proxy Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original objective function induces a proxy objective function that is implicitly minimized when using gradient methods. We show that stochastic gradient descent (SGD) on objectives satisfying proxy convexity or the proxy PL inequality leads to efficient guarantees for proxy objective functions. We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities.

artificial intelligence, inequality, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Convergence Rates for Distribution Matching with Sliced Optimal Transport

Thurin, Gauthier, Boyer, Claire, Nadjahi, Kimia

arXiv.org Machine LearningFeb-12-2026

We study the slice-matching scheme, an efficient iterative method for distribution matching based on sliced optimal transport. We investigate convergence to the target distribution and derive quantitative non-asymptotic rates. To this end, we establish __ojasiewicz-type inequalities for the Sliced-Wasserstein objective. A key challenge is to control along the trajectory the constants in these inequalities. We show that this becomes tractable for Gaussian distributions. Specifically, eigenvalues are controlled when matching along random orthonormal bases at each iteration. We complement our theory with numerical experiments and illustrate the predicted dependence on dimension and step-size, as well as the stabilizing effect of orthonormal-basis sampling.

artificial intelligence, inequality, machine learning, (18 more...)

arXiv.org Machine Learning

2602.10691

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > Belgium > Flanders > West Flanders > Bruges (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ProxyConvexity: AUnifiedFramework fortheAnalysisofNeuralNetworks TrainedbyGradientDescent

Neural Information Processing SystemsFeb-8-2026, 09:26:56 GMT

Weintroduce thenotions of proxy convexity and proxy Polyak-Lojasiewicz (PL) inequalities, which are satisfied iftheoriginal objectivefunction induces aproxy objectivefunction that is implicitly minimized when using gradient methods.

artificial intelligence, inequality, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Spectral Thresholds for Identifiability and Stability:Finite-Sample Phase Transitions in High-Dimensional Learning

Huang, William Hao-Cheng

arXiv.org Machine LearningOct-7-2025

In high-dimensional learning, models remain stable until they collapse abruptly once the sample size falls below a critical level. This instability is not algorithm-specific but a geometric mechanism: when the weakest Fisher eigendirection falls beneath sample-level fluctuations, identifiability fails. Our Fisher Threshold Theorem formalizes this by proving that stability requires the minimal Fisher eigenvalue to exceed an explicit $O(\sqrt{d/n})$ bound. Unlike prior asymptotic or model-specific criteria, this threshold is finite-sample and necessary, marking a sharp phase transition between reliable concentration and inevitable failure. To make the principle constructive, we introduce the Fisher floor, a verifiable spectral regularization robust to smoothing and preconditioning. Synthetic experiments on Gaussian mixtures and logistic models confirm the predicted transition, consistent with $d/n$ scaling. Statistically, the threshold sharpens classical eigenvalue conditions into a non-asymptotic law; learning-theoretically, it defines a spectral sample-complexity frontier, bridging theory with diagnostics for robust high-dimensional inference.

inequality, stability, threshold, (15 more...)

arXiv.org Machine Learning

2510.03809

Country:

North America > United States > Indiana > Hamilton County > Fishers (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

FedTilt: Towards Multi-Level Fairness-Preserving and Robust Federated Learning

Zhang, Binghui, De La Cruz, Luis Mares, Wang, Binghui

arXiv.org Artificial IntelligenceMar-15-2025

Federated Learning (FL) is an emerging decentralized learning paradigm that can partly address the privacy concern that cannot be handled by traditional centralized and distributed learning. Further, to make FL practical, it is also necessary to consider constraints such as fairness and robustness. However, existing robust FL methods often produce unfair models, and existing fair FL methods only consider one-level (client) fairness and are not robust to persistent outliers (i.e., injected outliers into each training round) that are common in real-world FL settings. We propose \texttt{FedTilt}, a novel FL that can preserve multi-level fairness and be robust to outliers. In particular, we consider two common levels of fairness, i.e., \emph{client fairness} -- uniformity of performance across clients, and \emph{client data fairness} -- uniformity of performance across different classes of data within a client. \texttt{FedTilt} is inspired by the recently proposed tilted empirical risk minimization, which introduces tilt hyperparameters that can be flexibly tuned. Theoretically, we show how tuning tilt values can achieve the two-level fairness and mitigate the persistent outliers, and derive the convergence condition of \texttt{FedTilt} as well. Empirically, our evaluation results on a suite of realistic federated datasets in diverse settings show the effectiveness and flexibility of the \texttt{FedTilt} framework and the superiority to the state-of-the-arts.

artificial intelligence, fairness, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.13537

Country: North America > United States > Illinois (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (0.87)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Convergence Analysis of the Wasserstein Proximal Algorithm beyond Geodesic Convexity

Zhu, Shuailong, Chen, Xiaohui

arXiv.org Machine LearningJan-24-2025

The proximal algorithm is a powerful tool to minimize nonlinear and nonsmooth functionals in a general metric space. Motivated by the recent progress in studying the training dynamics of the noisy gradient descent algorithm on two-layer neural networks in the mean-field regime, we provide in this paper a simple and self-contained analysis for the convergence of the general-purpose Wasserstein proximal algorithm without assuming geodesic convexity of the objective functional. Under a natural Wasserstein analog of the Euclidean Polyak-{\L}ojasiewicz inequality, we establish that the proximal algorithm achieves an unbiased and linear convergence rate. Our convergence rate improves upon existing rates of the proximal algorithm for solving Wasserstein gradient flows under strong geodesic convexity. We also extend our analysis to the inexact proximal algorithm for geodesically semiconvex objectives. In our numerical experiments, proximal training demonstrates a faster convergence rate than the noisy gradient descent algorithm on mean-field neural networks.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

2501.14993

Country:

North America > United States > California (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Liang, Yingyu, Long, Jiangxuan, Shi, Zhenmei, Song, Zhao, Zhou, Yufa

arXiv.org Artificial IntelligenceOct-15-2024

Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants. However, their growing capabilities come at the cost of extremely large model sizes, making deployment on edge devices challenging due to memory and computational constraints. This paper introduces a novel approach to LLM weight pruning that directly optimizes for approximating the attention matrix, a core component of transformer architectures. Unlike existing methods that focus on linear approximations, our approach accounts for the non-linear nature of the Softmax attention mechanism. We provide theoretical guarantees for the convergence of our Gradient Descent-based optimization method to a near-optimal pruning mask solution. Our preliminary empirical results demonstrate the effectiveness of this approach in maintaining model performance while significantly reducing computational costs. This work establishes a new theoretical foundation for pruning algorithm design in LLMs, potentially paving the way for more efficient LLM inference on resource-constrained devices.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.11261

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(5 more...)

Genre: Research Report > New Finding (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Methodology Establishing Linear Convergence of Adaptive Gradient Methods under PL Inequality

Chakrabarti, Kushal, Baranwal, Mayank

arXiv.org Machine LearningJul-17-2024

Adaptive gradient-descent optimizers are the standard choice for training neural network models. Despite their faster convergence than gradient-descent and remarkable performance in practice, the adaptive optimizers are not as well understood as vanilla gradient-descent. A reason is that the dynamic update of the learning rate that helps in faster convergence of these methods also makes their analysis intricate. Particularly, the simple gradient-descent method converges at a linear rate for a class of optimization problems, whereas the practically faster adaptive gradient methods lack such a theoretical guarantee. The Polyak-{\L}ojasiewicz (PL) inequality is the weakest known class, for which linear convergence of gradient-descent and its momentum variants has been proved. Therefore, in this paper, we prove that AdaGrad and Adam, two well-known adaptive gradient methods, converge linearly when the cost function is smooth and satisfies the PL inequality. Our theoretical framework follows a simple and unified approach, applicable to both batch and stochastic gradients, which can potentially be utilized in analyzing linear convergence of other variants of Adam.

adaptive gradient method, linear convergence, pl inequality

arXiv.org Machine Learning

2407.12629

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Fixed-Time Convergence for a Class of Nonconvex-Nonconcave Min-Max Problems

Garg, Kunal, Baranwal, Mayank

arXiv.org Artificial IntelligenceJul-26-2022

This study develops a fixed-time convergent saddle point dynamical system for solving min-max problems under a relaxation of standard convexity-concavity assumption. In particular, it is shown that by leveraging the dynamical systems viewpoint of an optimization algorithm, accelerated convergence to a saddle point can be obtained. Instead of requiring the objective function to be strongly-convex--strongly-concave (as necessitated for accelerated convergence of several saddle-point algorithms), uniform fixed-time convergence is guaranteed for functions satisfying only the two-sided Polyak-{\L}ojasiewicz (PL) inequality. A large number of practical problems, including the robust least squares estimation, are known to satisfy the two-sided PL inequality. The proposed method achieves arbitrarily fast convergence compared to any other state-of-the-art method with linear or even super-linear convergence, as also corroborated in numerical case studies.

artificial intelligence, machine learning, pl inequality, (18 more...)

arXiv.org Artificial Intelligence

2207.12845

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Asia > Middle East > Jordan (0.04)
Asia > India > Maharashtra > Mumbai (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)

Add feedback