AITopics | interpolation-like condition

Collaborating Authors

interpolation-like condition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Escaping Saddle-Point Faster under Interpolation-like Conditions

Neural Information Processing SystemsDec-24-2025, 07:26:34 GMT

In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions satisfied by the stochastic gradients in an over-parametrization setting, the first-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD) algorithm to reach an $\epsilon$-local-minimizer, matches the corresponding deterministic rate of $O(1/\epsilon^{2})$. We next analyze Stochastic Cubic-Regularized Newton (SCRN) algorithm under interpolation-like conditions, and show that the oracle complexity to reach an $\epsilon$-local-minimizer under interpolation-like conditions, is $O(1/\epsilon^{2.5})$. While this obtained complexity is better than the corresponding complexity of either PSGD, or SCRN without interpolation-like assumptions, it does not match the rate of $O(1/\epsilon^{1.5})$

complexity, interpolation-like assumption, interpolation-like condition, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)

Add feedback

Review for NeurIPS paper: Escaping Saddle-Point Faster under Interpolation-like Conditions

Neural Information Processing SystemsJan-26-2025, 14:34:28 GMT

Weaknesses: - The importance of the SGC condition remains unclear. In Line 129, the authors claimed that SGC condition is satisfied in some practical settings such as the training of deep neural networks, therefore the SGC condition should be regarded as an interesting special setting for nonconvex optimization. However, recent work [1,2] showed that the training of deep neural networks can be further regarded as a special task of convex optimization in the Neural tangent kernel (NTK) regime, which is a stronger condition than SGC. Therefore, the authors may want to clarify the importance of SGC by showing some more examples in machine learning. As the authors suggested, [VBS18] firstly studied the SGC condition under nonconvex setting and proposed that SGD costs O(1/\epsilon 2) gradient complexity to find first-order stationary points. Meanwhile, note that [AZL18] proposed a generic framework which could turn any algorithms for finding first-order stationary points into algorithms for finding approximate local minimizer, without hurting the convergence rate.

interpolation-like condition, optimization, sgc condition, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Escaping Saddle-Point Faster under Interpolation-like Conditions

Neural Information Processing SystemsOct-10-2024, 19:06:57 GMT

In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions satisfied by the stochastic gradients in an over-parametrization setting, the first-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD) algorithm to reach an \epsilon -local-minimizer, matches the corresponding deterministic rate of O(1/\epsilon {2}) . We next analyze Stochastic Cubic-Regularized Newton (SCRN) algorithm under interpolation-like conditions, and show that the oracle complexity to reach an \epsilon -local-minimizer under interpolation-like conditions, is O(1/\epsilon {2.5}) . While this obtained complexity is better than the corresponding complexity of either PSGD, or SCRN without interpolation-like assumptions, it does not match the rate of O(1/\epsilon {1.5}) corresponding to deterministic Cubic-Regularized Newton method.

complexity, interpolation-like assumption, interpolation-like condition, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

Improved Complexities for Stochastic Conditional Gradient Methods under Interpolation-like Conditions

Xiao, Tesi, Balasubramanian, Krishnakumar, Ghadimi, Saeed

arXiv.org Machine LearningJun-15-2020

In a machine learning setup, the function F could be interpreted as the loss function associated with a sample ξ and the function f could represent the risk, which is defined as the expected loss. Such constrained stochastic optimization problems arise frequently in statistical machine learning applications. Conditional gradient algorithm, also called as Frank-Wolfe algorithm, is an efficient method for solving constrained optimization problems of the form in (1) due to their projection-free nature [Jag13, HJN15, FGM17, LPZZ17, BZK18, RDLS18]. In each step of the conditional gradient method, it is only required to minimize a linear objective over the set Ω. This operation could be implemented efficiently for a variety of sets arising in statistical machine learning, compared to the operation of projecting on to the set Ω, which is required for example by the projected gradient method. Hence, conditional gradient method has regained popularity in the last decade in the optimization and machine learning community.

artificial intelligence, machine learning, optimization problem, (13 more...)

arXiv.org Machine Learning

2006.08167

Country:

North America > United States > California > Yolo County > Davis (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback