AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Faster variational quantum algorithms with quantum kernel-based surrogate models

Smith, Alistair W. R., Paige, A. J., Kim, M. S.

arXiv.org Artificial IntelligenceAug-14-2023

We present a new optimization method for small-to-intermediate scale variational algorithms on noisy near-term quantum processors which uses a Gaussian process surrogate model equipped with a classically-evaluated quantum kernel. Variational algorithms are typically optimized using gradient-based approaches however these are difficult to implement on current noisy devices, requiring large numbers of objective function evaluations. Our scheme shifts this computational burden onto the classical optimizer component of these hybrid algorithms, greatly reducing the number of queries to the quantum processor. We focus on the variational quantum eigensolver (VQE) algorithm and demonstrate numerically that such surrogate models are particularly well suited to the algorithm's objective function. Next, we apply these models to both noiseless and noisy VQE simulations and show that they exhibit better performance than widely-used classical kernels in terms of final accuracy and convergence speed. Compared to the typically-used stochastic gradient-descent approach for VQAs, our quantum kernel-based approach is found to consistently achieve significantly higher accuracy while requiring less than an order of magnitude fewer quantum circuit evaluations. We analyse the performance of the quantum kernel-based models in terms of the kernels' induced feature spaces and explicitly construct their feature maps. Finally, we describe a scheme for approximating the best-performing quantum kernel using a classically-efficient tensor network representation of its input state and so provide a pathway for scaling these methods to larger systems.

artificial intelligence, machine learning, optimization problem, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1088/2058-9565/aceb87

2211.01134

Country:

North America > Canada > Alberta (0.14)
South America > Ecuador > Pichincha Province > Quito (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization

Li, Chris Junchi, Yuan, Angela, Gidel, Gauthier, Gu, Quanquan, Jordan, Michael I.

arXiv.org Artificial IntelligenceAug-14-2023

The main idea of our algorithm is to carefully leverage the structure of the minimax problem, performing Nesterov acceleration on the individual component and optimistic gradient on the coupling component. Equipped with proper restarting, we show that AG-OG achieves the optimal convergence rate (up to a constant) for a variety of settings, including bilinearly coupled strongly convex-strongly concave minimax optimization (bi-SC-SC), bilinearly coupled convex-strongly concave minimax optimization (bi-C-SC), and bilinear games. We also extend our algorithm to the stochastic setting and achieve the optimal convergence rate in both bi-SC-SC and bi-C-SC settings. AG-OG is the first single-call algorithm with optimal convergence rates in both deterministic and stochastic settings for bilinearly coupled minimax optimization problems.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.1755

Country:

Asia > Middle East > Jordan (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Law of Balance and Stationary Distribution of Stochastic Gradient Descent

Ziyin, Liu, Li, Hongchao, Ueda, Masahito

arXiv.org Artificial IntelligenceAug-12-2023

The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks. However, it remains poorly understood how the SGD navigates the highly nonlinear and degenerate loss landscape of a neural network. In this work, we prove that the minibatch noise of SGD regularizes the solution towards a balanced solution whenever the loss function contains a rescaling symmetry. Because the difference between a simple diffusion process and SGD dynamics is the most significant when symmetries are present, our theory implies that the loss function symmetries constitute an essential probe of how SGD works. We then apply this result to derive the stationary distribution of stochastic gradient flow for a diagonal linear network with arbitrary depth and width. The stationary distribution exhibits complicated nonlinear phenomena such as phase transitions, broken ergodicity, and fluctuation inversion. These phenomena are shown to exist uniquely in deep networks, implying a fundamental difference between deep and shallow models.

artificial intelligence, machine learning, stationary distribution, (14 more...)

arXiv.org Artificial Intelligence

2308.06671

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

An Efficient Stochastic Algorithm for Decentralized Nonconvex-Strongly-Concave Minimax Optimization

Chen, Lesi, Ye, Haishan, Luo, Luo

arXiv.org Artificial IntelligenceAug-12-2023

This paper studies the stochastic optimization for decentralized nonconvex-strongly-concave (NC-SC) minimax problems over a multi-agent network. We propose an efficient algorithm, called the Decentralized Recursive-gradient descEnt Ascent Method (DREAM), which achieves the best-known theoretical guarantee for finding the $\epsilon$-stationary point of the primal function. The proposed method requires $\mathcal{O}(\min (\kappa^3\epsilon^{-3},\sqrt{N} \kappa^2 \epsilon^{-2} ))$ stochastic first-order oracle (SFO) calls and $\tilde{\mathcal{O}}(\kappa^2 \epsilon^{-2})$ communication rounds to find an $\epsilon$-stationary point, where $\kappa$ is the condition number. DREAM achieves the best-known complexity for both the online and offline setups.

artificial intelligence, machine learning, optimization, (16 more...)

arXiv.org Artificial Intelligence

2212.02387

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Efficient Variational Inference for Large Skew-t Copulas with Application to Intraday Equity Returns

Deng, Lin, Smith, Michael Stanley, Maneesoonthorn, Worapree

arXiv.org Artificial IntelligenceAug-10-2023

Large skew-t factor copula models are attractive for the modeling of financial data because they allow for asymmetric and extreme tail dependence. We show that the copula implicit in the skew-t distribution of Azzalini and Capitanio (2003) allows for a higher level of pairwise asymmetric dependence than two popular alternative skew-t copulas. Estimation of this copula in high dimensions is challenging, and we propose a fast and accurate Bayesian variational inference (VI) approach to do so. The method uses a conditionally Gaussian generative representation of the skew-t distribution to define an augmented posterior that can be approximated accurately. A fast stochastic gradient ascent algorithm is used to solve the variational optimization. The new methodology is used to estimate copula models for intraday returns from 2017 to 2021 on 93 U.S. equities. The copula captures substantial heterogeneity in asymmetric dependence over equity pairs, in addition to the variability in pairwise correlations. We show that intraday predictive densities from the skew-t copula are more accurate than from some other copula models, while portfolio selection strategies based on the estimated pairwise tail dependencies improve performance relative to the benchmark index.

artificial intelligence, copula, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2308.05564

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.92)

Industry:

Banking & Finance > Trading (1.00)
Government > Regional Government > North America Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Byzantine-Robust Decentralized Stochastic Optimization with Stochastic Gradient Noise-Independent Learning Error

Peng, Jie, Li, Weiyu, Ling, Qing

arXiv.org Artificial IntelligenceAug-9-2023

This paper studies Byzantine-robust stochastic optimization over a decentralized network, where every agent periodically communicates with its neighbors to exchange local models, and then updates its own local model by stochastic gradient descent (SGD). The performance of such a method is affected by an unknown number of Byzantine agents, which conduct adversarially during the optimization process. To the best of our knowledge, there is no existing work that simultaneously achieves a linear convergence speed and a small learning error. We observe that the learning error is largely dependent on the intrinsic stochastic gradient noise. Motivated by this observation, we introduce two variance reduction methods, stochastic average gradient algorithm (SAGA) and loopless stochastic variance-reduced gradient (LSVRG), to Byzantine-robust decentralized stochastic optimization for eliminating the negative effect of the stochastic gradient noise. The two resulting methods, BRAVO-SAGA and BRAVO-LSVRG, enjoy both linear convergence speeds and stochastic gradient noise-independent learning errors. Such learning errors are optimal for a class of methods based on total variation (TV)-norm regularization and stochastic subgradient update. We conduct extensive numerical experiments to demonstrate their effectiveness under various Byzantine attacks.

agent, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2308.05292

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Singapore (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre:

Research Report (0.70)
Overview (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

An Exact Kernel Equivalence for Finite Classification Models

Bell, Brian, Geyer, Michael, Glickenstein, David, Fernandez, Amanda, Moore, Juston

arXiv.org Artificial IntelligenceAug-9-2023

We explore the equivalence between neural networks and kernel methods by deriving the first exact representation of any finite-size parametric classification model trained with gradient descent as a kernel machine. We compare our exact representation to the well-known Neural Tangent Kernel (NTK) and discuss approximation error relative to the NTK and other non-exact path kernel formulations. We experimentally demonstrate that the kernel can be computed for realistic networks up to machine precision. We use this exact kernel to show that our theoretical contribution can provide useful insights into the predictions made by neural networks, particularly the way in which they generalize.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2308.00824

Country:

North America > United States > Arizona (0.04)
North America > United States > Texas > Bexar County > San Antonio (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Energy (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks

Neumayer, Sebastian, Chizat, Lénaïc, Unser, Michael

arXiv.org Artificial IntelligenceAug-9-2023

In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy for the optimization path of gradient descent initialized from zero. In this paper, we study a modification of the regularization path for infinite-width 2-layer ReLU neural networks with nonzero initial distribution of the weights at different scales. By exploiting a link with unbalanced optimal-transport theory, we show that, despite the non-convexity of the 2-layer network training, this problem admits an infinite-dimensional convex counterpart. We formulate the corresponding functional-optimization problem and investigate its main properties. In particular, we show that, as the scale of the initialization ranges between $0$ and $+\infty$, the associated path interpolates continuously between the so-called kernel and rich regimes. Numerical experiments confirm that, in our setting, the scaling path and the final states of the optimization path behave similarly, even beyond these extreme points.

artificial intelligence, machine learning, neural information processing system, (16 more...)

arXiv.org Artificial Intelligence

2303.17805

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Illinois (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Improving Performance in Continual Learning Tasks using Bio-Inspired Architectures

Madireddy, Sandeep, Yanguas-Gil, Angel, Balaprakash, Prasanna

arXiv.org Artificial IntelligenceAug-8-2023

The ability to learn continuously from an incoming data stream without catastrophic forgetting is critical to designing intelligent systems. Many approaches to continual learning rely on stochastic gradient descent and its variants that employ global error updates, and hence need to adopt strategies such as memory buffers or replay to circumvent its stability, greed, and short-term memory limitations. To address this limitation, we have developed a biologically inspired lightweight neural network architecture that incorporates synaptic plasticity mechanisms and neuromodulation and hence learns through local error signals to enable online continual learning without stochastic gradient descent. Our approach leads to superior online continual learning performance on Split-MNIST, Split-CIFAR-10, and Split-CIFAR-100 datasets compared to other memory-constrained learning approaches and matches that of the state-of-the-art memory-intensive replay-based approaches. We further demonstrate the effectiveness of our approach by integrating key design concepts into other backpropagation-based continual learning algorithms, significantly improving their accuracy. Our results provide compelling evidence for the importance of incorporating biological principles into machine learning models and offer insights into how we can leverage them to design more efficient and robust systems for online continual learning. Online continual learning addresses the scenario where a system has to learn and process data that are continuously streamed, often without restrictions in terms of the distribution of data within and across tasks and without clearly identified task boundaries Mai et al. (2021); Chen et al. (2020); Aljundi et al. (2019a). Online continual learning algorithms seek to mitigate catastrophic forgetting at both the data-instance and task level Chen et al. (2020). In some cases, however, such as on-chip learning at the edge, additional considerations such as resource limitations in the hardware, data privacy, or data security are also important for online continual learning. A key challenge of online continual learning is that it runs counter to the optimal conditions required for optimization using stochastic gradient descent (SGD) Parisi et al. (2019), which struggles with non-stationary data streams Lindsey & Litwin-Kumar (2020). On the contrary, biological systems excel at online continual learning. Inspired by the structure and functionality of the mammal brain, several approaches have adopted replay strategies to counteract catastrophic forgetting during non-stationary tasks.

artificial intelligence, learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2308.04539

Country:

North America > United States > Illinois > Cook County > Lemont (0.04)
North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre:

Research Report (1.00)
Instructional Material > Online (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Smoothing the Edges: A General Framework for Smooth Optimization in Sparse Regularization using Hadamard Overparametrization

Kolb, Chris, Müller, Christian L., Bischl, Bernd, Rügamer, David

arXiv.org Artificial IntelligenceAug-8-2023

This paper presents a framework for smooth optimization of objectives with $\ell_q$ and $\ell_{p,q}$ regularization for (structured) sparsity. Finding solutions to these non-smooth and possibly non-convex problems typically relies on specialized optimization routines. In contrast, the method studied here is compatible with off-the-shelf (stochastic) gradient descent that is ubiquitous in deep learning, thereby enabling differentiable sparse regularization without approximations. The proposed optimization transfer comprises an overparametrization of selected model parameters followed by a change of penalties. In the overparametrized problem, smooth and convex $\ell_2$ regularization induces non-smooth and non-convex regularization in the original parametrization. We show that the resulting surrogate problem not only has an identical global optimum but also exactly preserves the local minima. This is particularly useful in non-convex regularization, where finding global solutions is NP-hard and local minima often generalize well. We provide an integrative overview that consolidates various literature strands on sparsity-inducing parametrizations in a general setting and meaningfully extend existing approaches. The feasibility of our approach is evaluated through numerical experiments, demonstrating its effectiveness by matching or outperforming common implementations of convex and non-convex regularizers.

artificial intelligence, machine learning, regularization, (17 more...)

arXiv.org Artificial Intelligence

2307.03571

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback