AITopics | improved convergence rate

Collaborating Authors

improved convergence rate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Neural Information Processing SystemsDec-24-2025, 12:46:06 GMT

The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications. Especially, its variance reduced versions have nowadays gained particular attention. In this paper, we study two variants of this kind, namely, the Stochastic Variance Reduced Gradient Langevin Dynamics and the Stochastic Recursive Gradient Langevin Dynamics. We prove their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality which are weaker conditions than those used in prior works for these algorithms. With the batch size and the inner loop length set to $\sqrt{n}$, the gradient complexity to achieve an $\epsilon$-precision is $\tilde{O}((n+dn^{1/2}\epsilon^{-1})\gamma^2 L^2\alpha^{-2})$, which is an improvement from any previous analyses. We also show some essential applications of our result to non-convex optimization.

gradient langevin dynamic, langevin dynamic, stochastic gradient langevin dynamic, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.82)

Add feedback

Supplementary Material: Extrapolation Towards Imaginary 0-Nearest Neighbour and Its Improved Convergence Rate A Related works Györfi (1981) is the first work that proves the convergence rate O (n

Neural Information Processing SystemsAug-17-2025, 08:52:31 GMT

In this section, we describe Nadaraya-Watson (NW) classifier, Local Polynomial (LP) classifier and their convergence rates (Audibert & Tsybakov, 2007). Proof of Corollary 2. Proposition 6 immediately proves the assertion. We basically follow the proof of Chaudhuri & Dasgupta (2014) Theorem 4(b). In Section G.1, we first define symbols In Section G.2, we describe the sketch of the proof and main differences between our proof and that of Section G.3 shows the main body of the Proof, by utilizing several Lemmas listed in A minimum radius whose measure of the ball is larger than t > 0, i.e., r Chaudhuri & Dasgupta (2014) Lemma 21) Then, the assertion is proved. See the following Section G.4 for Lemma 1-7 used in this proof.

chaudhuri & dasgupta, exp, nullnull null, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

Add feedback

Review for NeurIPS paper: Extrapolation Towards Imaginary 0-Nearest Neighbour and Its Improved Convergence Rate

Neural Information Processing SystemsFeb-8-2025, 11:40:42 GMT

This is previously done by leveraging assumptions over the distribution (its smoothness (\beta-Holder condition, \gamma-neighbour average smoothness) and "margin" (\alpha-margin condition, namely, upper bounding the probability of instances whose expected label expectation is 1/2)) along with using weighted k-NN, with several methods for defining the weights and their resulting convergence rate. The paper proposes a new method, called MS-k-NN, of estimating several unweighted k-NN estimators, for different values of k (\nu(k) for simplicity of notation in the review). For each k, a radius r is associated by taking the distance from the query to its k-closest neighbour. Then, pairs (r(k), \nu(k)) are obtained, and parameters b are obtained by linear regression in order to estimate \nu(k) by a polynomial in r(k). It is proven that this method obtains the optimal convergence rates obtained by more cumbersome methods of weighted k-NN. It is shown that experimentally, over a few datasets, the performance of MS-k-NN is similar to that of the weighted k-NN.

imaginary 0-nearest neighbour, improved convergence rate, neurips paper, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Review for NeurIPS paper: Extrapolation Towards Imaginary 0-Nearest Neighbour and Its Improved Convergence Rate

Neural Information Processing SystemsFeb-8-2025, 11:40:35 GMT

The paper presents a new nonparametric learning method, which seems to combine certain elements of k-nearest neighbors with elements of local regression estimation. It recovers the optimal rates for classification with smooth regression functions and Tsybakov noise, previously established for a local polynomial regression method, but uses a predictor representation involving far fewer parameters, as in a simple weighted k-NN predictor. The reviewers favor accepting the paper. However, they have some reservations, as they would prefer the paper be presented differently, with more space dedicated to presenting the new techniques, and with more investigation into the strengths of this particular method compared to the well-known standard techniques.

imaginary 0-nearest neighbour, improved convergence rate, neurips paper, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.71)

Add feedback

Extrapolation Towards Imaginary 0-Nearest Neighbour and Its Improved Convergence Rate

Neural Information Processing SystemsJan-16-2025, 07:43:01 GMT

The weights and the parameter k \in \mathbb{N} regulate its bias-variance trade-off, and the trade-off implicitly affects the convergence rate of the excess risk for the k -NN classifier; several existing studies considered selecting optimal k and weights to obtain faster convergence rate. Whereas k -NN with non-negative weights has been developed widely, it was also proved that negative weights are essential for eradicating the bias terms and attaining optimal convergence rate. In this paper, we propose a novel multiscale k -NN (MS- k -NN), that extrapolates unweighted k -NN estimators from several k \ge 1 values to k 0, thus giving an imaginary 0-NN estimator. Our method implicitly computes optimal real-valued weights that are adaptive to the query and its neighbour points. We theoretically prove that the MS- k -NN attains the improved rate, which coincides with the existing optimal rate under some conditions.

extrapolation, imaginary 0-nearest neighbour, improved convergence rate, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Neural Information Processing SystemsJan-13-2025, 20:18:29 GMT

The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications. Especially, its variance reduced versions have nowadays gained particular attention. In this paper, we study two variants of this kind, namely, the Stochastic Variance Reduced Gradient Langevin Dynamics and the Stochastic Recursive Gradient Langevin Dynamics. We prove their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality which are weaker conditions than those used in prior works for these algorithms. With the batch size and the inner loop length set to \sqrt{n}, the gradient complexity to achieve an \epsilon -precision is \tilde{O}((n dn {1/2}\epsilon {-1})\gamma 2 L 2\alpha {-2}), which is an improvement from any previous analyses.

gradient langevin dynamic, langevin dynamic, stochastic gradient langevin dynamic, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback