AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Twin-Boot: Uncertainty-Aware Optimization via Online Two-Sample Bootstrapping

arXiv.org Machine LearningAug-22-2025

Standard gradient descent methods yield point estimates with no measure of confidence. This limitation is acute in overparameterized and low-data regimes, where models have many parameters relative to available data and can easily overfit. Bootstrapping is a classical statistical framework for uncertainty estimation based on resampling, but naively applying it to deep learning is impractical: it requires training many replicas, produces post-hoc estimates that cannot guide learning, and implicitly assumes comparable optima across runs - an assumption that fails in non-convex landscapes. We introduce Twin-Bootstrap Gradient Descent (Twin-Boot), a resampling-based training procedure that integrates uncertainty estimation into optimization. Two identical models are trained in parallel on independent bootstrap samples, and a periodic mean-reset keeps both trajectories in the same basin so that their divergence reflects local (within-basin) uncertainty. During training, we use this estimate to sample weights in an adaptive, data-driven way, providing regularization that favors flatter solutions. In deep neural networks and complex high-dimensional inverse problems, the approach improves calibration and generalization and yields interpretable uncertainty maps.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2508.15019

Country: Europe > Portugal > Lisbon > Lisbon (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Stabilization of Perturbed Loss Function: Differential Privacy without Gradient Noise

Habib, Salman, Chou, Remi, Kim, Taejoon

arXiv.org Artificial IntelligenceAug-22-2025

We propose SPOF (Stabilization of Perturbed Loss Function), a differentially private training mechanism intended for multi-user local differential privacy (LDP). SPOF perturbs a stabilized Taylor expanded polynomial approximation of a model's training loss function, where each user's data is privatized by calibrated noise added to the coefficients of the polynomial. Unlike gradient-based mechanisms such as differentially private stochastic gradient descent (DP-SGD), SPOF does not require injecting noise into the gradients of the loss function, which improves both computational efficiency and stability. This formulation naturally supports simultaneous privacy guarantees across all users. Moreover, SPOF exhibits robustness to environmental noise during training, maintaining stable performance even when user inputs are corrupted. We compare SPOF with a multi-user extension of DP-SGD, evaluating both methods in a wireless body area network (WBAN) scenario involving heterogeneous user data and stochastic channel noise from body sensors. Our results show that SPOF achieves, on average, up to 3.5% higher reconstruction accuracy and reduces mean training time by up to 57.2% compared to DP-SGD, demonstrating superior privacy-utility trade-offs in multi-user environments.

artificial intelligence, machine learning, spof, (17 more...)

arXiv.org Artificial Intelligence

2508.15523

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Cooperative SGD with Dynamic Mixing Matrices

Sarkar, Soumya, Jain, Shweta

arXiv.org Artificial IntelligenceAug-22-2025

One of the most common methods to train machine learning algorithms today is the stochastic gradient descent (SGD). In a distributed setting, SGD-based algorithms have been shown to converge theoretically under specific circumstances. A substantial number of works in the distributed SGD setting assume a fixed topology for the edge devices. These papers also assume that the contribution of nodes to the global model is uniform. However, experiments have shown that such assumptions are suboptimal and a non uniform aggregation strategy coupled with a dynamically shifting topology and client selection can significantly improve the performance of such models. This paper details a unified framework that covers several Local-Update SGD-based distributed algorithms with dynamic topologies and provides improved or matching theoretical guarantees on convergence compared to existing work.

artificial intelligence, machine learning, scenario, (17 more...)

arXiv.org Artificial Intelligence

2508.14565

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Hybrid Least Squares/Gradient Descent Methods for DeepONets

Choi, Jun, Lee, Chang-Ock, Moon, Minam

arXiv.org Artificial IntelligenceAug-22-2025

We propose an efficient hybrid least squares/gradient descent method to accelerate DeepONet training. Since the output of DeepONet can be viewed as linear with respect to the last layer parameters of the branch network, these parameters can be optimized using a least squares (LS) solve, and the remaining hidden layer parameters are updated by means of gradient descent form. However, building the LS system for all possible combinations of branch and trunk inputs yields a prohibitively large linear problem that is infeasible to solve directly. To address this issue, our method decomposes the large LS system into two smaller, more manageable subproblems $\unicode{x2014}$ one for the branch network and one for the trunk network $\unicode{x2014}$ and solves them separately. This method is generalized to a broader type of $L^2$ loss with a regularization term for the last layer parameters, including the case of unsupervised learning with physics-informed loss.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.15394

Country: Asia > South Korea (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.81)

Add feedback

c48fe446e651cd49fb58a6833e015103-Paper-Conference.pdf

Neural Information Processing SystemsAug-21-2025, 04:48:02 GMT

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe (0.93)
North America > United States (0.46)

Genre: Research Report (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Energy > Oil & Gas (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(2 more...)

Add feedback

907a9fb75a408f6c3a2ae1bf84c39e44-Paper-Conference.pdf

Neural Information Processing SystemsAug-21-2025, 01:22:28 GMT

artificial intelligence, experiment, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Italy (0.14)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Amortized Bayesian Meta-Learning for Low-Rank Adaptation of Large Language Models

Zhang, Liyi, Snell, Jake, Griffiths, Thomas L.

arXiv.org Machine LearningAug-21-2025

Fine-tuning large language models (LLMs) with low-rank adaptaion (LoRA) is a cost-effective way to incorporate information from a specific dataset. However, it is often unclear how well the fine-tuned LLM will generalize, i.e., how well it will perform on unseen datasets. Methods have been proposed to improve generalization by optimizing with in-context prompts, or by using meta-learning to fine-tune LLMs. However, these methods are expensive in memory and computation, requiring either long-context prompts or saving copies of parameters and using second-order gradient updates. To address these challenges, we propose Amortized Bayesian Meta-Learning for LoRA (ABMLL). This method builds on amortized Bayesian meta-learning for smaller models, adapting this approach to LLMs while maintaining its computational efficiency. We reframe task-specific and global parameters in the context of LoRA and use a set of new hyperparameters to balance reconstruction accuracy and the fidelity of task-specific parameters to the global ones. ABMLL provides effective generalization and scales to large models such as Llama3-8B. Furthermore, as a result of using a Bayesian framework, ABMLL provides improved uncertainty quantification. We test ABMLL on Unified-QA and CrossFit datasets and find that it outperforms existing methods on these benchmarks in terms of both accuracy and expected calibration error.

large language model, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2508.14285

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)
(2 more...)

Add feedback

On the Interplay between Graph Structure and Learning Algorithms in Graph Neural Networks

Su, Junwei, Wu, Chuan

arXiv.org Artificial IntelligenceAug-21-2025

This paper studies the interplay between learning algorithms and graph structure for graph neural networks (GNNs). Existing theoretical studies on the learning dynamics of GNNs primarily focus on the convergence rates of learning algorithms under the interpolation regime (noise-free) and offer only a crude connection between these dynamics and the actual graph structure (e.g., maximum degree). This paper aims to bridge this gap by investigating the excessive risk (generalization performance) of learning algorithms in GNNs within the generalization regime (with noise). Specifically, we extend the conventional settings from the learning theory literature to the context of GNNs and examine how graph structure influences the performance of learning algorithms such as stochastic gradient descent (SGD) and Ridge regression. Our study makes several key contributions toward understanding the interplay between graph structure and learning in GNNs. First, we derive the excess risk profiles of SGD and Ridge regression in GNNs and connect these profiles to the graph structure through spectral graph theory. With this established framework, we further explore how different graph structures (regular vs. power-law) impact the performance of these algorithms through comparative analysis. Additionally, we extend our analysis to multi-layer linear GNNs, revealing an increasing non-isotropic effect on the excess risk profile, thereby offering new insights into the over-smoothing issue in GNNs from the perspective of learning algorithms. Our empirical results align with our theoretical predictions, \emph{collectively showcasing a coupling relation among graph structure, GNNs and learning algorithms, and providing insights on GNN algorithm design and selection in practice.}

artificial intelligence, graph structure, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.14338

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Inter-Class Relational Loss for Small Object Detection: A Case Study on License Plates

Ning, Dian, Han, Dong Seog

arXiv.org Artificial IntelligenceAug-21-2025

In one-stage multi-object detection tasks, various intersection over union (IoU)-based solutions aim at smooth and stable convergence near the targets during training. However, IoU-based losses fail to correctly update the gradient of small objects due to an extremely flat gradient. During the update of multiple objects, the learning of small objects' gradients suffers more because of insufficient gradient updates. Therefore, we propose an inter-class relational loss to efficiently update the gradient of small objects while not sacrificing the learning efficiency of other objects based on the simple fact that an object has a spatial relationship to another object (e.g., a car plate is attached to a car in a similar position). When the predicted car plate's bounding box is not within its car, a loss punishment is added to guide the learning, which is inversely proportional to the overlapped area of the car's and predicted car plate's bounding box. By leveraging the spatial relationship at the inter-class level, the loss guides small object predictions using larger objects and enhances latent information in deeper feature maps. In this paper, we present twofold contributions using license plate detection as a case study: (1) a new small vehicle multi-license plate dataset (SVMLP), featuring diverse real-world scenarios with high-quality annotations; and (2) a novel inter-class relational loss function designed to promote effective detection performance. We highlight the proposed ICR loss penalty can be easily added to existing IoU-based losses and enhance the performance. These contributions improve the standard mean Average Precision (mAP) metric, achieving gains of 10.3% and 1.6% in mAP$^{\text{test}}_{50}$ for YOLOv12-T and UAV-DETR, respectively, without any additional hyperparameter tuning. Code and dataset will be available soon.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.14343

Country:

Asia > China (0.46)
South America > Brazil (0.28)
Europe > Switzerland (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Double Quantization for Communication-Efficient Distributed Optimization

Yue Yu, Jiaxiang Wu, Longbo Huang

Neural Information Processing SystemsAug-20-2025, 08:06:02 GMT

Modern distributed training of machine learning models often suffers from high communication overhead for synchronizing stochastic gradients and model parameters.

algorithm, gradient, neural information processing system, (12 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback