AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Review for NeurIPS paper: A Non-Asymptotic Analysis for Stein Variational Gradient Descent

Neural Information Processing SystemsJan-23-2025, 05:24:02 GMT

Additional Feedback: Post rebuttal: Thank you for you response. I keep my score as it is (which is accept) -- but like R2, I really would like to see the finite-N results in the extra page that would be provided if this paper is accepted. The analysis looks sound and nicely done. Some elaboration on what satisfies this inequality is provided before Sec. Therefore, these results are still not precisely about the SVGD, but on its infinite-particle limit.

non-asymptotic analysis, stein variational gradient descent, theorem 7, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Review for NeurIPS paper: A Non-Asymptotic Analysis for Stein Variational Gradient Descent

Neural Information Processing SystemsJan-23-2025, 05:23:55 GMT

The reviewers agree that this paper would make a valuable contribution to NeurIPS. Please have a look at the reviewer comments (especially about moving finite-particle results to the main paper) while working on the final manuscript.

neurips paper, non-asymptotic analysis, stein variational gradient descent

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Review for NeurIPS paper: Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

Neural Information Processing SystemsJan-23-2025, 02:06:36 GMT

Weaknesses: - Below eq (3), for the upper bound of \delta_t the right-hand side should be 2\sum_s\eta_sa_s instead of 2\sum_s\eta_sa_s\delta_s . It would be interesting to add some discussions or comparison with these references mentioned below: 1. "Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent". In this paper, their work relaxes the smoothness to \alpha -Holder continuity of (sub)gradients, which include the non-smooth loss functions in this paper as \alpha 0 . Their stability analysis also improves the optimal generalization bounds O(1/\sqrt{n}) for multi-pass SGD with T O(n 2) . It seems to me that the main technical novelty appeared in the proof of Lemma 3 which studied \delta_t 2 (as opposed to the study of \delta_t in Hardt et al's paper) using the approximate contraction for the gradient mapping for the non-smooth loss which has already explored in the above paper. Similar ideas have already explored in the above reference in a more general setting.

neurips paper, nonsmooth convex loss, stochastic gradient descent, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Review for NeurIPS paper: Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

Neural Information Processing SystemsJan-23-2025, 02:06:28 GMT

This paper got high scores: 9,7,6,8, all with high confidence. The major concerns are from Reviewer #3, who asked about the relationship with "Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent" ([*]) and whether the technique of using Poisson sampling in [1] can be used in the current work which uses uniform sampling. While the authors clarified in the rebuttal that their work is stronger than [*] and some of the results in [*] may not be generalized to the case in the current paper, and the technique in [1] can be applied straightforwardly, Reviewer #3 further refuted on the first claim and doubted on the second claim during discussion. The AC confirmed that this paper is concurrent with [*] and deemed that Reviewer #3 may have missed the sketched proof in the "Privacy Analysis" section of the rebuttal. During further discussion, Reviewer #3 acknowledged that the sketched proof made sense and supported acceptance.

nonsmooth convex loss, reviewer, stochastic gradient descent, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Fast and Provable Tensor-Train Format Tensor Completion via Precondtioned Riemannian Gradient Descent

Bian, Fengmiao, Cai, Jian-Feng, Zhang, Xiaoqun, Zhang, Yuanwei

arXiv.org Artificial IntelligenceJan-23-2025

Low-rank tensor completion aims to recover a tensor from partially observed entries, and it is widely applicable in fields such as quantum computing and image processing. Due to the significant advantages of the tensor train (TT) format in handling structured high-order tensors, this paper investigates the low-rank tensor completion problem based on the TT-format. We proposed a preconditioned Riemannian gradient descent algorithm (PRGD) to solve low TT-rank tensor completion and establish its linear convergence. Experimental results on both simulated and real datasets demonstrate the effectiveness of the PRGD algorithm. On the simulated dataset, the PRGD algorithm reduced the computation time by two orders of magnitude compared to existing classical algorithms. In practical applications such as hyperspectral image completion and quantum state tomography, the PRGD algorithm significantly reduced the number of iterations, thereby substantially reducing the computational time.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.13385

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
North America > United States (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Communication-Efficient Stochastic Distributed Learning

Ren, Xiaoxing, Bastianello, Nicola, Johansson, Karl H., Parisini, Thomas

arXiv.org Artificial IntelligenceJan-23-2025

We address distributed learning problems, both nonconvex and convex, over undirected networks. In particular, we design a novel algorithm based on the distributed Alternating Direction Method of Multipliers (ADMM) to address the challenges of high communication costs, and large datasets. Our design tackles these challenges i) by enabling the agents to perform multiple local training steps between each round of communications; and ii) by allowing the agents to employ stochastic gradients while carrying out local computations. We show that the proposed algorithm converges to a neighborhood of a stationary point, for nonconvex problems, and of an optimal point, for convex problems. We also propose a variant of the algorithm to incorporate variance reduction thus achieving exact convergence. We show that the resulting algorithm indeed converges to a stationary (or optimal) point, and moreover that local training accelerates convergence. We thoroughly compare the proposed algorithms with the state of the art, both theoretically and through numerical results.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.13516

Country:

Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
Asia > Singapore (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Energy > Power Industry (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Reviews: Splitting Steepest Descent for Growing Neural Architectures

Neural Information Processing SystemsJan-22-2025, 23:45:33 GMT

The paper introduces an algorithm for splitting neurons when training neural networks using gradient descent, in order to progressively grow a neural network architecture for a better fit to the data. Some theoretical connections to wasserstein-infinity gradient steps are presented, as well as multiple experiments for learning interpretable or lightweight neural networks. The proposed method is original and interesting, and the experiments suggest that it is promising for various applications. However, the current proposed method seems far from practical in large scale settings, and the presented theoretical results are quite limited: * computationally, the algorithm as presented in Algorithm 1 requires (i) to check if a local optima is reached on the full training loss (and it seems that gradient updates are performed on the full training loss, which is costly), (ii) eigenvector computation on splitting matrices, (iii) dealing with dynamically growing weight matrices. These aspects seem to make the algorithm impractical compared to simply running SGD on an over-parameterized network.

algorithm, neural architecture, splitting steepest descent, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Review for NeurIPS paper: Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm

Neural Information Processing SystemsJan-22-2025, 19:42:41 GMT

Additional Feedback: Post rebuttal: The authors addressed my comments. Therefore, I keep my score as'accept' but not higher as I think the clarity of the writing should be improved. When G is nonsmooth and proximable, using proximal maps lead to much faster convergence compared to using subgradients in the optimization case. It is therefore an important problem to investigate the sampling analogue of this scheme which is the topic of this paper. As mentioned, some previous work has been done on this problem, but this paper presents an approach that is most general (in terms of G being supported on a more general set) to date.

neurips paper, primal dual interpretation, proximal stochastic gradient langevin algorithm, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Review for NeurIPS paper: Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm

Neural Information Processing SystemsJan-22-2025, 19:42:33 GMT

This paper presents a new analysis for proximal stochastic gradient Langevin algorithm. All the reviewers recognized the intellectual merits with minor concerns. I would suggest the authors to carefully revise their paper based the reviewers' comments.

neurips paper, primal dual interpretation, proximal stochastic gradient langevin algorithm, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.87)

Add feedback

Reviews: Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Neural Information Processing SystemsJan-22-2025, 09:42:32 GMT

UPDATE: I've read the other reviews and the rebuttal. I am keeping my score - this is a good paper. The study of Stochastic Gradient Descent in overparametrized setting is a popular and important trend in a recent development of huge-scale optimization for deep-learning. The authors propose a very basic and classical method, consisting from the well-known algorithmical blocks (SGD Armijo-type line search) together with its first theoretical justification under "interpolation assumption". The proof of convergence (for example, Theorem 2) mainly consists from the standard arguments (which are used for the proof of the classical non-stochastic Gradient Method under Lipschitz-continuous gradients).

assumption, interpolation assumption, painless stochastic gradient, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.85)

Add feedback