Goto

Collaborating Authors

 Gradient Descent


Breaking Through Barren Plateaus: Reinforcement Learning Initializations for Deep Variational Quantum Circuits

arXiv.org Artificial Intelligence

Variational Quantum Algorithms (VQAs) have gained prominence as a viable framework for exploiting near-term quantum devices in applications ranging from optimization and chemistry simulation to machine learning. However, the effectiveness of VQAs is often constrained by the so-called barren plateau problem, wherein gradients diminish exponentially as system size or circuit depth increases, thereby hindering training. In this work, we propose a reinforcement learning (RL)-based initialization strategy to alleviate the barren plateau issue by reshaping the initial parameter landscape to avoid regions prone to vanishing gradients. In particular, we explore several RL algorithms (Deterministic Policy Gradient, Soft Actor-Critic, and Proximal Policy Optimization, etc.) to generate the circuit parameters (treated as actions) that minimize the VQAs cost function before standard gradient-based optimization. By pre-training with RL in this manner, subsequent optimization using methods such as gradient descent or Adam proceeds from a more favorable initial state. Extensive numerical experiments under various noise conditions and tasks consistently demonstrate that the RL-based initialization method significantly enhances both convergence speed and final solution quality. Moreover, comparisons among different RL algorithms highlight that multiple approaches can achieve comparable performance gains, underscoring the flexibility and robustness of our method. These findings shed light on a promising avenue for integrating machine learning techniques into quantum algorithm design, offering insights into how RL-driven parameter initialization can accelerate the scalability and practical deployment of VQAs. Opening up a promising path for the research community in machine learning for quantum, especially barren plateau problems in VQAs.


A Proportional-Integral Controller-Incorporated SGD Algorithm for High Efficient Latent Factor Analysis

arXiv.org Artificial Intelligence

--In industrial big data scenarios, high-dimensional sparse matrices (HDI) are widely used to characterize high-order interaction relationships among massive nodes. The stochastic gradient descent-based latent factor analysis (SGD-LFA) method can effectively extract deep feature information embedded in HDI matrices. However, existing SGD-LFA methods exhibit significant limitations: their parameter update process relies solely on the instantaneous gradient information of current samples, failing to incorporate accumulated experiential knowledge from historical iterations or account for intrinsic correlations between samples, resulting in slow convergence speed and suboptimal generalization performance. Thus, this paper proposes a PILF model by developing a PI-accelerated SGD algorithm by integrating correlated instances and refining learning errors through proportional-integral (PI) control mechanism that current and historical information; Comparative experiments demonstrate the superior representation capability of the PILF model on HDI matrices. ATA serves as the foundation for industrial application development.


Stochastic Gradient Descent with Strategic Querying

arXiv.org Artificial Intelligence

This paper considers a finite-sum optimization problem under first-order queries and investigates the benefits of strategic querying on stochastic gradient-based methods compared to uniform querying strategy. We first introduce Oracle Gradient Querying (OGQ), an idealized algorithm that selects one user's gradient yielding the largest possible expected improvement (EI) at each step. However, OGQ assumes oracle access to the gradients of all users to make such a selection, which is impractical in real-world scenarios. To address this limitation, we propose Strategic Gradient Querying (SGQ), a practical algorithm that has better transient-state performance than SGD while making only one query per iteration. For smooth objective functions satisfying the Polyak-Lojasiewicz condition, we show that under the assumption of EI heterogeneity, OGQ enhances transient-state performance and reduces steady-state variance, while SGQ improves transient-state performance over SGD. Our numerical experiments validate our theoretical findings.


Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches

arXiv.org Artificial Intelligence

Modern GPUs are equipped with large amounts of high-bandwidth memory, enabling them to support mini-batch sizes of up to tens of thousands of training samples. However, most existing optimizers struggle to perform effectively at such a large batch size. As batch size increases, gradient noise decreases due to averaging over many samples, limiting the ability of first-order methods to escape sharp or suboptimal minima and reach the global minimum. Meanwhile, second-order methods like the natural gradient with Kronecker-Factored Approximate Curvature (KFAC) often require excessively high damping to remain stable at large batch sizes. This high damping effectively "washes out" the curvature information that gives these methods their advantage, reducing their performance to that of simple gradient descent. In this paper, we introduce Fisher-Orthogonal Projection (FOP), a novel technique that restores the effectiveness of the second-order method at very large batch sizes, enabling scalable training with improved generalization and faster convergence. FOP constructs a variance-aware update direction by leveraging gradients from two sub-batches, enhancing the average gradient with a component of the gradient difference that is orthogonal to the average under the Fisher-metric. Through extensive benchmarks, we show that FOP accelerates convergence by 1 .2-1. 3 over KFAC and 1 .


Escaping Saddle Points via Curvature-Calibrated Perturbations: A Complete Analysis with Explicit Constants and Empirical Validation

arXiv.org Machine Learning

We present a comprehensive theoretical analysis of first-order methods for escaping strict saddle points in smooth non-convex optimization. Our main contribution is a Perturbed Saddle-escape Descent (PSD) algorithm with fully explicit constants and a rigorous separation between gradient-descent and saddle-escape phases. For a function $f:\mathbb{R}^d\to\mathbb{R}$ with $\ell$-Lipschitz gradient and $ฯ$-Lipschitz Hessian, we prove that PSD finds an $(ฮต,\sqrt{ฯฮต})$-approximate second-order stationary point with high probability using at most $O(\ellฮ”_f/ฮต^2)$ gradient evaluations for the descent phase plus $O((\ell/\sqrt{ฯฮต})\log(d/ฮด))$ evaluations per escape episode, with at most $O(\ellฮ”_f/ฮต^2)$ episodes needed. We validate our theoretical predictions through extensive experiments across both synthetic functions and practical machine learning tasks, confirming the logarithmic dimension dependence and the predicted per-episode function decrease. We also provide complete algorithmic specifications including a finite-difference variant (PSD-Probe) and a stochastic extension (PSGD) with robust mini-batch sizing.


Chunked Data Shapley: A Scalable Dataset Quality Assessment for Machine Learning

arXiv.org Artificial Intelligence

As the volume and diversity of available datasets continue to increase, assessing data quality has become crucial for reliable and efficient Machine Learning analytics. A modern, game-theoretic approach for evaluating data quality is the notion of Data Shapley which quantifies the value of individual data points within a dataset. State-of-the-art methods to scale the NP-hard Shapley computation also face severe challenges when applied to large-scale datasets, limiting their practical use. In this work, we present a Data Shapley approach to identify a dataset's high-quality data tuples, Chunked Data Shapley (C-DaSh). C-DaSh scalably divides the dataset into manageable chunks and estimates the contribution of each chunk using optimized subset selection and single-iteration stochastic gradient descent. This approach drastically reduces computation time while preserving high quality results. We empirically benchmark our method on diverse real-world classification and regression tasks, demonstrating that C-DaSh outperforms existing Shapley approximations in both computational efficiency (achieving speedups between 80x - 2300x) and accuracy in detecting low-quality data regions. Our method enables practical measurement of dataset quality on large tabular datasets, supporting both classification and regression pipelines.