AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Breaking Through Barren Plateaus: Reinforcement Learning Initializations for Deep Variational Quantum Circuits

Peng, Yifeng, Li, Xinyi, Zhang, Zhemin, Chen, Samuel Yen-Chi, Liang, Zhiding, Wang, Ying

arXiv.org Artificial IntelligenceAug-27-2025

Variational Quantum Algorithms (VQAs) have gained prominence as a viable framework for exploiting near-term quantum devices in applications ranging from optimization and chemistry simulation to machine learning. However, the effectiveness of VQAs is often constrained by the so-called barren plateau problem, wherein gradients diminish exponentially as system size or circuit depth increases, thereby hindering training. In this work, we propose a reinforcement learning (RL)-based initialization strategy to alleviate the barren plateau issue by reshaping the initial parameter landscape to avoid regions prone to vanishing gradients. In particular, we explore several RL algorithms (Deterministic Policy Gradient, Soft Actor-Critic, and Proximal Policy Optimization, etc.) to generate the circuit parameters (treated as actions) that minimize the VQAs cost function before standard gradient-based optimization. By pre-training with RL in this manner, subsequent optimization using methods such as gradient descent or Adam proceeds from a more favorable initial state. Extensive numerical experiments under various noise conditions and tasks consistently demonstrate that the RL-based initialization method significantly enhances both convergence speed and final solution quality. Moreover, comparisons among different RL algorithms highlight that multiple approaches can achieve comparable performance gains, underscoring the flexibility and robustness of our method. These findings shed light on a promising avenue for integrating machine learning techniques into quantum algorithm design, offering insights into how RL-driven parameter initialization can accelerate the scalability and practical deployment of VQAs. Opening up a promising path for the research community in machine learning for quantum, especially barren plateau problems in VQAs.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2508.18514

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

A Proportional-Integral Controller-Incorporated SGD Algorithm for High Efficient Latent Factor Analysis

Li, Jinli, Long, Shiyu, Han, Minglian

arXiv.org Artificial IntelligenceAug-26-2025

--In industrial big data scenarios, high-dimensional sparse matrices (HDI) are widely used to characterize high-order interaction relationships among massive nodes. The stochastic gradient descent-based latent factor analysis (SGD-LFA) method can effectively extract deep feature information embedded in HDI matrices. However, existing SGD-LFA methods exhibit significant limitations: their parameter update process relies solely on the instantaneous gradient information of current samples, failing to incorporate accumulated experiential knowledge from historical iterations or account for intrinsic correlations between samples, resulting in slow convergence speed and suboptimal generalization performance. Thus, this paper proposes a PILF model by developing a PI-accelerated SGD algorithm by integrating correlated instances and refining learning errors through proportional-integral (PI) control mechanism that current and historical information; Comparative experiments demonstrate the superior representation capability of the PILF model on HDI matrices. ATA serves as the foundation for industrial application development.

artificial intelligence, data mining, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2508.17609

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Stochastic Gradient Descent with Strategic Querying

Jiang, Nanfei, Wai, Hoi-To, Alizadeh, Mahnoosh

arXiv.org Artificial IntelligenceAug-26-2025

This paper considers a finite-sum optimization problem under first-order queries and investigates the benefits of strategic querying on stochastic gradient-based methods compared to uniform querying strategy. We first introduce Oracle Gradient Querying (OGQ), an idealized algorithm that selects one user's gradient yielding the largest possible expected improvement (EI) at each step. However, OGQ assumes oracle access to the gradients of all users to make such a selection, which is impractical in real-world scenarios. To address this limitation, we propose Strategic Gradient Querying (SGQ), a practical algorithm that has better transient-state performance than SGD while making only one query per iteration. For smooth objective functions satisfying the Polyak-Lojasiewicz condition, we show that under the assumption of EI heterogeneity, OGQ enhances transient-state performance and reduces steady-state variance, while SGQ improves transient-state performance over SGD. Our numerical experiments validate our theoretical findings.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.17144

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches

Lu, Yishun, Armour, Wesley

arXiv.org Artificial IntelligenceAug-26-2025

Modern GPUs are equipped with large amounts of high-bandwidth memory, enabling them to support mini-batch sizes of up to tens of thousands of training samples. However, most existing optimizers struggle to perform effectively at such a large batch size. As batch size increases, gradient noise decreases due to averaging over many samples, limiting the ability of first-order methods to escape sharp or suboptimal minima and reach the global minimum. Meanwhile, second-order methods like the natural gradient with Kronecker-Factored Approximate Curvature (KFAC) often require excessively high damping to remain stable at large batch sizes. This high damping effectively "washes out" the curvature information that gives these methods their advantage, reducing their performance to that of simple gradient descent. In this paper, we introduce Fisher-Orthogonal Projection (FOP), a novel technique that restores the effectiveness of the second-order method at very large batch sizes, enabling scalable training with improved generalization and faster convergence. FOP constructs a variance-aware update direction by leveraging gradients from two sub-batches, enhancing the average gradient with a component of the gradient difference that is orthogonal to the average under the Fisher-metric. Through extensive benchmarks, we show that FOP accelerates convergence by 1 .2-1. 3 over KFAC and 1 .

artificial intelligence, batch size, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.13898

Genre: Research Report (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Escaping Saddle Points via Curvature-Calibrated Perturbations: A Complete Analysis with Explicit Constants and Empirical Validation

Alpay, Faruk, Alakkad, Hamdi

arXiv.org Machine LearningAug-25-2025

We present a comprehensive theoretical analysis of first-order methods for escaping strict saddle points in smooth non-convex optimization. Our main contribution is a Perturbed Saddle-escape Descent (PSD) algorithm with fully explicit constants and a rigorous separation between gradient-descent and saddle-escape phases. For a function $f:\mathbb{R}^d\to\mathbb{R}$ with $\ell$-Lipschitz gradient and $ρ$-Lipschitz Hessian, we prove that PSD finds an $(ε,\sqrt{ρε})$-approximate second-order stationary point with high probability using at most $O(\ellΔ_f/ε^2)$ gradient evaluations for the descent phase plus $O((\ell/\sqrt{ρε})\log(d/δ))$ evaluations per escape episode, with at most $O(\ellΔ_f/ε^2)$ episodes needed. We validate our theoretical predictions through extensive experiments across both synthetic functions and practical machine learning tasks, confirming the logarithmic dimension dependence and the predicted per-episode function decrease. We also provide complete algorithmic specifications including a finite-difference variant (PSD-Probe) and a stochastic extension (PSGD) with robust mini-batch sizing.

artificial intelligence, machine learning, saddle point, (13 more...)

arXiv.org Machine Learning

2508.1654

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.37)

Add feedback

Chunked Data Shapley: A Scalable Dataset Quality Assessment for Machine Learning

Loizou, Andreas, Tsoumakos, Dimitrios

arXiv.org Artificial IntelligenceAug-25-2025

As the volume and diversity of available datasets continue to increase, assessing data quality has become crucial for reliable and efficient Machine Learning analytics. A modern, game-theoretic approach for evaluating data quality is the notion of Data Shapley which quantifies the value of individual data points within a dataset. State-of-the-art methods to scale the NP-hard Shapley computation also face severe challenges when applied to large-scale datasets, limiting their practical use. In this work, we present a Data Shapley approach to identify a dataset's high-quality data tuples, Chunked Data Shapley (C-DaSh). C-DaSh scalably divides the dataset into manageable chunks and estimates the contribution of each chunk using optimized subset selection and single-iteration stochastic gradient descent. This approach drastically reduces computation time while preserving high quality results. We empirically benchmark our method on diverse real-world classification and regression tasks, demonstrating that C-DaSh outperforms existing Shapley approximations in both computational efficiency (achieving speedups between 80x - 2300x) and accuracy in detecting low-quality data regions. Our method enables practical measurement of dataset quality on large tabular datasets, supporting both classification and regression pipelines.

data quality, data tuple, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3746252.3761305

2508.16255

Country: Europe > Greece (0.14)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)

Add feedback

Efficient Convex Relaxations for Streaming PCA

Raman Arora, Teodor Vanislavov Marinov

Neural Information Processing SystemsAug-22-2025, 02:23:17 GMT

These algorithms have been shown to outperform Oja's algorithm, empirically, in terms of the iteration complexity, and to have runtime comparable with Oja's.

algorithm, complexity, matrix, (13 more...)

Neural Information Processing Systems

Country: