AITopics | smoothness constant

Collaborating Authors

smoothness constant

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Low-Rank Optimal Transport through Factor Relaxation with Latent Coupling Peter Halmos

Neural Information Processing SystemsFeb-18-2026, 05:40:46 GMT

We leverage these advantages to derive a new algorithm Factor Relaxation with Latent Coupling (FRLC), which uses coordinate mirror descent to compute the LC factorization.

artificial intelligence, machine learning, optimization problem, (20 more...)

Neural Information Processing Systems

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Health & Medicine > Diagnostic Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Vision (0.67)

Add feedback

AT Lemmas

Neural Information Processing SystemsFeb-17-2026, 17:48:48 GMT

Let k (,) be the ν = 2 .5 Matérn kernel.

artificial intelligence, machine learning, probability, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods

Neural Information Processing SystemsDec-27-2025, 03:03:57 GMT

The classical analysis of Stochastic Gradient Descent (SGD) with polynomially decaying stepsize $\eta_t = \eta/\sqrt{t}$ relies on well-tuned $\eta$ depending on problem parameters such as Lipschitz smoothness constant, which is often unknown in practice. In this work, we prove that SGD with arbitrary $\eta > 0$, referred to as untuned SGD, still attains an order-optimal convergence rate $\widetilde{\mathcal{O}}(T^{-1/4})$ in terms of gradient norm for minimizing smooth objectives. Unfortunately, it comes at the expense of a catastrophic exponential dependence on the smoothness constant, which we show is unavoidable for this scheme even in the noiseless setting. We then examine three families of adaptive methods -- Normalized SGD (NSGD), AMSGrad, and AdaGrad -- unveiling their power in preventing such exponential dependency in the absence of information about the smoothness parameter and boundedness of stochastic gradients. Our results provide theoretical justification for the advantage of adaptive methods over untuned SGD in alleviating the issue with large gradients.

adaptive method, name change, untuned sgd, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.84)

Add feedback

Provable Benefit of Sign Descent: A Minimal Model Under Heavy-Tailed Class Imbalance

Yadav, Robin, Xie, Shuo, Wang, Tianhao, Li, Zhiyuan

arXiv.org Artificial IntelligenceDec-2-2025

Adaptive optimization methods (such as Adam) play a major role in LLM pretraining, significantly outperforming Gradient Descent (GD). Recent studies have proposed new smoothness assumptions on the loss function to explain the advantages of adaptive algorithms with structured preconditioners, e.g., coordinate-wise or layer-wise, and steepest descent methods w.r.t. non-euclidean norms, e.g., $\ell_\infty$ norm or spectral norm, over GD. However, it remains unclear how these smoothness assumptions manifest in language modelling tasks. In this work, we aim to analyze the benefit of $\ell_\infty$-norm descent (a.k.a. sign descent) directly from properties of the data distribution, namely, heavy-tailed class imbalance. We propose a minimal yet representative setting of next-token prediction, where we can provably show faster convergence of coordinate-wise algorithms such as Sign descent (steepest descent w.r.t. $\ell_\infty$ norm) over normalized GD (steepest descent w.r.t. to $\ell_2$ norm) in the presence of heavy tail class imbalance.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2512.00763

Country:

North America > United States (0.46)
Europe > Austria (0.28)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

The Power of Extrapolation in Federated Learning

Neural Information Processing SystemsNov-20-2025, 13:42:15 GMT

In our work, we focus on smooth convex or strongly convex problems in the interpolation regime.

artificial intelligence, convergence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Saudi Arabia (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)

Add feedback

Tighter Convergence Bounds for Shuffled SGD via Primal-Dual Perspective

Neural Information Processing SystemsNov-19-2025, 19:33:43 GMT

Appendix E), the variance across multiple runs is negligible, hence the ribbons are not observable.

artificial intelligence, convergence, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Mathematics of Computing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings

Markou, Evan, Ajanthan, Thalaiyasingam, Gould, Stephen

arXiv.org Artificial IntelligenceOct-27-2025

Many high-dimensional optimisation problems exhibit rich geometric structures in their set of minimisers, often forming smooth manifolds due to over-parametrisation or symmetries. When this structure is known, at least locally, it can be exploited through reduction mappings that reparametrise part of the parameter space to lie on the solution manifold. These reductions naturally arise from inner optimisation problems and effectively remove redundant directions, yielding a lower-dimensional objective. In this work, we introduce a general framework to understand how such reductions influence the optimisation landscape. We show that well-designed reduction mappings improve curvature properties of the objective, leading to better-conditioned problems and theoretically faster convergence for gradient-based methods. Our analysis unifies a range of scenarios where structural information at optimality is leveraged to accelerate convergence, offering a principled explanation for the empirical gains observed in such optimisation algorithms.

artificial intelligence, machine learning, mapping, (17 more...)

arXiv.org Artificial Intelligence

2506.08428

Country: