AITopics | gradient descent method

Recent progress has been made in understanding the statistical generalization performance of gradient descent methods for overparameterized neural networks within the neural tangent kernel (NTK) regime. However, most of the existing work on regression problems is limited to shallow network architectures, leaving a notable gap in the theory of deep neural networks. This paper addresses this gap by presenting a comprehensive generalization analysis for deep ReLU networks trained using gradient descent (GD) and stochastic gradient descent (SGD). Specifically, we establish the first known minimax-optimal rates of excess population risk for both GD and SGD with deep ReLU networks, under the assumption that the network width scales polynomially with respect to the network depth and training sample size. Our results demonstrate that with sufficient width, gradient descent methods for deep ReLU networks can achieve optimal generalization rates on par with kernel methods.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

2606.06764

Country: Europe > Germany (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Supplementary Material for " Path following algorithms for ℓ2-regularized M-estimation with approximation guarantee "

Neural Information Processing SystemsApr-24-2026, 04:41:48 GMT

Figure S2: Number of iterations at each grid point for the Newton and gradient descent methods applying to the ℓ2-regularized logistic regression over simulated data generated in Example 2. We summarize the results in Figure S1-S3. Figure S1 presents the results for ridge regression. In this case, the number of iterations by gradient method first increases and then stays flat as tk grows. Newton method, however, only takes one 1.51.5 iteration at each grid point. Moreover, the level of approximation (i.e., ϵ) seems to have no impact onthe number of iterations at each grid point, which is highly desirable.

artificial intelligence, machine learning, tmax, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)

Add feedback

00296c0e10cd24d415c2db63ea2a2c68-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 04:41:45 GMT

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.72)

Add feedback

e60e81c4cbe5171cd654662d9887aec2-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 15:56:51 GMT

algorithm, second-order critical point, spurious second-order critical point, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

e60e81c4cbe5171cd654662d9887aec2-Paper.pdf

Neural Information Processing SystemsAug-18-2025, 07:06:19 GMT

artificial intelligence, machine learning, second-order critical point, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Real-time Capable Learning-based Visual Tool Pose Correction via Differentiable Simulation

Yang, Shuyuan, Chua, Zonghe

arXiv.org Artificial IntelligenceMay-15-2025

--Autonomy in Minimally Invasive Robotic Surgery (MIRS) has the potential to reduce surgeon cognitive and task load, thereby increasing procedural efficiency. However, implementing accurate autonomous control can be difficult due to poor end-effector proprioception, a limitation of their cable-driven mechanisms. Although the robot may have joint encoders for the end-effector pose calculation, various non-idealities make the entire kinematics chain inaccurate. Modern vision-based pose estimation methods lack real-time capability or can be hard to train and generalize. In this work, we demonstrate a real-time capable, vision transformer-based pose estimation approach that is trained using end-to-end differentiable kinematics and rendering in simulation. We demonstrate the potential of this method to correct for noisy pose estimates in simulation, with the longer term goal of verifying the sim-to-real transferability of our approach. The da Vinci Surgical System has been widely applied into different kinds of MIRS procedures in specializations such as, urologic [1], gynecologic [2], and cardiothoracic [3] surgery.

artificial intelligence, correction, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2505.08875

Country: North America > United States > Ohio > Cuyahoga County > Cleveland (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Theoretical Framework for Tempered Fractional Gradient Descent: Application to Breast Cancer Classification

Naifar, Omar

arXiv.org Artificial IntelligenceApr-29-2025

This paper introduces Tempered Fractional Gradient Descent (TFGD), a novel optimization framework that synergizes fractional calculus with exponential tempering to enhance gradient-based learning. Traditional gradient descent methods often suffer from oscillatory updates and slow convergence in high-dimensional, noisy landscapes. TFGD addresses these limitations by incorporating a tempered memory mechanism, where historical gradients are weighted by fractional coefficients $|w_j| = \binomα{j}$ and exponentially decayed via a tempering parameter $λ$. Theoretical analysis establishes TFGD's convergence guarantees: in convex settings, it achieves an $\mathcal{O}(1/K)$ rate with alignment coefficient $d_{α,λ} = (1 - e^{-λ})^{-α}$, while stochastic variants attain $\mathcal{O}(1/k^α)$ error decay. The algorithm maintains $\mathcal{O}(n)$ time complexity equivalent to SGD, with memory overhead scaling as $\mathcal{O}(d/λ)$ for parameter dimension $d$. Empirical validation on the Breast Cancer Wisconsin dataset demonstrates TFGD's superiority, achieving 98.25\% test accuracy (vs. 92.11\% for SGD) and 2$\times$ faster convergence. The tempered memory mechanism proves particularly effective in medical classification tasks, where feature correlations benefit from stable gradient averaging. These results position TFGD as a robust alternative to conventional optimizers in both theoretical and applied machine learning.

artificial intelligence, gradient, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2504.18849

Country:

North America > United States > Wisconsin (0.26)
Africa > Middle East (0.15)

Genre:

Research Report (0.50)
Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.63)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Gradient Descent Methods for Regularized Optimization

Nikolovski, Filip, Stojkovska, Irena, Saneva, Katerina Hadzi-Velkova, Hadzi-Velkov, Zoran

arXiv.org Artificial IntelligenceDec-28-2024

Regularization is a widely recognized technique in mathematical optimization. It can be used to smooth out objective functions, refine the feasible solution set, or prevent overfitting in machine learning models. Due to its simplicity and robustness, the gradient descent (GD) method is one of the primary methods used for numerical optimization of differentiable objective functions. However, GD is not well-suited for solving $\ell^1$ regularized optimization problems since these problems are non-differentiable at zero, causing iteration updates to oscillate or fail to converge. Instead, a more effective version of GD, called the proximal gradient descent employs a technique known as soft-thresholding to shrink the iteration updates toward zero, thus enabling sparsity in the solution. Motivated by the widespread applications of proximal GD in sparse and low-rank recovery across various engineering disciplines, we provide an overview of the GD and proximal GD methods for solving regularized optimization problems. Furthermore, this paper proposes a novel algorithm for the proximal GD method that incorporates a variable step size. Unlike conventional proximal GD, which uses a fixed step size based on the global Lipschitz constant, our method estimates the Lipschitz constant locally at each iteration and uses its reciprocal as the step size. This eliminates the need for a global Lipschitz constant, which can be impractical to compute. Numerical experiments we performed on synthetic and real-data sets show notable performance improvement of the proposed method compared to the conventional proximal GD with constant step size, both in terms of number of iterations and in time requirements.

gradient descent method, regularized optimization

arXiv.org Artificial Intelligence

2412.20115

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.80)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.73)

Add feedback

Optimisation challenge for superconducting adiabatic neural network implementing XOR and OR boolean functions

Pashin, D. S., Bastrakova, M. V., Rybin, D. A., Soloviev, I. I., Schegolev, A. E., Klenov, N. V.

arXiv.org Artificial IntelligenceMay-6-2024

In this article, we consider designs of simple analog artificial neural networks based on adiabatic Josephson cells with a sigmoid activation function. A new approach based on the gradient descent method is developed to adjust the circuit parameters, allowing efficient signal transmission between the network layers. The proposed solution is demonstrated on the example of the system implementing XOR and OR logical operations.

neural network, neuron, output neuron, (16 more...)

arXiv.org Artificial Intelligence

2405.03521

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
Asia > Russia (0.05)
South America > Chile (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

PruneSymNet: A Symbolic Neural Network and Pruning Algorithm for Symbolic Regression

Wu, Min, Li, Weijun, Yu, Lina, Li, Wenqiang, Liu, Jingyi, Li, Yanjie, Hao, Meilan

arXiv.org Artificial IntelligenceJan-25-2024

Symbolic regression aims to derive interpretable symbolic expressions from data in order to better understand and interpret data. %which plays an important role in knowledge discovery and interpretable machine learning. In this study, a symbolic network called PruneSymNet is proposed for symbolic regression. This is a novel neural network whose activation function consists of common elementary functions and operators. The whole network is differentiable and can be trained by gradient descent method. Each subnetwork in the network corresponds to an expression, and our goal is to extract such subnetworks to get the desired symbolic expression. Therefore, a greedy pruning algorithm is proposed to prune the network into a subnetwork while ensuring the accuracy of data fitting. The proposed greedy pruning algorithm preserves the edge with the least loss in each pruning, but greedy algorithm often can not get the optimal solution. In order to alleviate this problem, we combine beam search during pruning to obtain multiple candidate expressions each time, and finally select the expression with the smallest loss as the final result. It was tested on the public data set and compared with the current popular algorithms. The results showed that the proposed algorithm had better accuracy.

algorithm, expression, optimal solution, (14 more...)

arXiv.org Artificial Intelligence

2401.15103

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Beijing > Beijing (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filters

Collaborating Authors

gradient descent method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Optimal Rates for Generalization of Gradient Descent Methods with Deep Neural Networks

Supplementary Material for " Path following algorithms for ℓ2-regularized M-estimation with approximation guarantee "

00296c0e10cd24d415c2db63ea2a2c68-Paper-Conference.pdf

e60e81c4cbe5171cd654662d9887aec2-Paper.pdf

e60e81c4cbe5171cd654662d9887aec2-Paper.pdf

Real-time Capable Learning-based Visual Tool Pose Correction via Differentiable Simulation

Theoretical Framework for Tempered Fractional Gradient Descent: Application to Breast Cancer Classification

Gradient Descent Methods for Regularized Optimization

Optimisation challenge for superconducting adiabatic neural network implementing XOR and OR boolean functions

PruneSymNet: A Symbolic Neural Network and Pruning Algorithm for Symbolic Regression