AITopics | asymptotic convergence

Collaborating Authors

asymptotic convergence

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

c336346c777707e09cab2a3c79174d90-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 04:26:55 GMT

We also establish new convergence complexities to achieve an approximate KKT solution when the objective can be smooth/nonsmooth, deterministic/stochastic and convex/nonconvex with complexity that is on a par with gradient descent for unconstrained optimization problems in respective cases. To the best of our knowledge, this is the first study of the first-order methods with complexity guarantee for nonconvex sparse-constrained problems.

artificial intelligence, machine learning, xk 1, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

c336346c777707e09cab2a3c79174d90-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 04:26:48 GMT

algorithm, optimization, subproblem, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 18:02:24 GMT

The scheme finds a target point for each block in parallel in a chosen subset of blocks by minimizing the sum of a strongly convex approximation to the smooth part on this block (with matching gradients) and the non-smooth part. Each block in this subset is then updated (in parallel) as a convex combination of the previous value and the target points. A parallel proximal gradient scheme can be obtained as a special case; though using a convex combination of the iterates yield a slightly different scheme than previous work. The suggested algorithm is very similar to [9], except that in [9] the subset was chosen using a greedy scheme (which can be expensive), whereas this submission explores both randomized schemes or a cyclic scheme. For these, the authors prove the asymptotic convergence to a stationary point of the algorithm under standard Lipschitz gradient conditions.

algorithm, numerical experiment, supplementary material, (12 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Understanding the Role of Momentum in Stochastic Gradient Methods

Igor Gitman, Hunter Lang, Pengchuan Zhang, Lin Xiao

Neural Information Processing SystemsOct-2-2025, 17:02:50 GMT

The use of momentum in stochastic gradient methods has become a widespread practice in machine learning. Different variants of momentum, including heavy-ball momentum, Nesterov's accelerated gradient (NAG), and quasi-hyperbolic momentum (QHM), have demonstrated success on various tasks.

artificial intelligence, convergence rate, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.68)
North America > United States (0.46)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.72)

Add feedback

c336346c777707e09cab2a3c79174d90-Supplemental.pdf

Neural Information Processing SystemsAug-16-2025, 06:37:10 GMT

algorithm, pxq, subproblem, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained Optimization

Neural Information Processing SystemsAug-16-2025, 06:37:02 GMT

In this paper, we study a new model consisting of a general convex or nonconvex objectives and a variety of continuous nonconvex sparsity-inducing constraints.

algorithm, optimization, subproblem, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Review for NeurIPS paper: Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Neural Information Processing SystemsJan-24-2025, 07:03:14 GMT

Weaknesses: My main concern about the paper is whether this proposed algorithm is actually implementable due to the specific expression of the (constant) learning rate. I have two concerns: 1. The learning rate depends on t_{mix} in Theorem 1 and on the universal constants c_1 in both Theorem 1 and Theorem 2. How can we compute/approximate t_{mix} in advance? If we cannot, is it sufficient to employ a lower-bound on t_{mix}? Looking at the proofs c_1 is a function of constant c (Equation 55) that in turn derives from Bernstein's inequality (Equation 81) and subsequently \tilde{c} (Equation 84), but its value is never explicitly computed. I am aware that also in [33] the learning rate schedule (that is not constant) depends on \mu_{min} and t_{mix}, but I think the authors should elaborate more on this and explain how to deal with it in practice, if possible.

asynchronous q-learning, sample complexity, sharper analysis and variance reduction, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)

Add feedback

A Subsampling Based Neural Network for Spatial Data

Thakur, Debjoy

arXiv.org Machine LearningNov-5-2024

The application of deep neural networks in geospatial data has become a trending research problem in the present day. A significant amount of statistical research has already been introduced, such as generalized least square optimization by incorporating spatial variance-covariance matrix, considering basis functions in the input nodes of the neural networks, and so on. However, for lattice data, there is no available literature about the utilization of asymptotic analysis of neural networks in regression for spatial data. This article proposes a consistent localized two-layer deep neural network-based regression for spatial data. We have proved the consistency of this deep neural network for bounded and unbounded spatial domains under a fixed sampling design of mixed-increasing spatial regions. We have proved that its asymptotic convergence rate is faster than that of \cite{zhan2024neural}'s neural network and an improved generalization of \cite{shen2023asymptotic}'s neural network structure. We empirically observe the rate of convergence of discrepancy measures between the empirical probability distribution of observed and predicted data, which will become faster for a less smooth spatial surface. We have applied our asymptotic analysis of deep neural networks to the estimation of the monthly average temperature of major cities in the USA from its satellite image. This application is an effective showcase of non-linear spatial regression. We demonstrate our methodology with simulated lattice data in various scenarios.

convergence, deep neural network, neural network, (13 more...)

arXiv.org Machine Learning

2411.0362

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > New York (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Asymptotic and Non-Asymptotic Convergence Analysis of AdaGrad for Non-Convex Optimization via Novel Stopping Time-based Analysis

Jin, Ruinan, Wang, Xiaoyu, Wang, Baoxiang

arXiv.org Machine LearningSep-8-2024

Adaptive optimizers have emerged as powerful tools in deep learning, dynamically adjusting the learning rate based on iterative gradients. These adaptive methods have significantly succeeded in various deep learning tasks, outperforming stochastic gradient descent (SGD). However, although AdaGrad is a cornerstone adaptive optimizer, its theoretical analysis is inadequate in addressing asymptotic convergence and non-asymptotic convergence rates on non-convex optimization. This study aims to provide a comprehensive analysis and complete picture of AdaGrad. We first introduce a novel stopping time technique from probabilistic theory to establish stability for the norm version of AdaGrad under milder conditions. We further derive two forms of asymptotic convergence: almost sure and mean-square. Furthermore, we demonstrate the near-optimal non-asymptotic convergence rate measured by the average-squared gradients in expectation, which is rarely explored and stronger than the existing high-probability results, under the mild assumptions. The techniques developed in this work are potentially independent of interest for future research on other adaptive stochastic algorithms.

assumption 2, equation, inequality, (15 more...)

arXiv.org Machine Learning

2409.05023

Country:

Asia > China > Hong Kong (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Generalization Bounds for Domain Adaptation Center for Evolutionary Medicine and Informatics, The Biodesign Institute, and

Neural Information Processing SystemsMar-14-2024, 16:47:34 GMT

In this paper, we provide a new framework to study the generalization bound of the learning process for domain adaptation. We consider two kinds of representative domain adaptation settings: one is domain adaptation with multiple sources and the other is domain adaptation combining source and target data. In particular, we use the integral probability metric to measure the difference between two domains. Then, we develop the specific Hoeffding-type deviation inequality and symmetrization inequality for either kind of domain adaptation to achieve the corresponding generalization bound based on the uniform entropy number. By using the resultant generalization bound, we analyze the asymptotic convergence and the rate of convergence of the learning process for domain adaptation. Meanwhile, we discuss the factors that affect the asymptotic behavior of the learning process. The numerical experiments support our results.

domain adaptation, generalization, multiple source, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > New York (0.04)
North America > United States > Arizona (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback