AITopics

2212.07469

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Giegrich, Michael, Reisinger, Christoph, Zhang, Yufei

Convergence of policy gradient methods for finite-horizon stochastic linear-quadratic control problems

arXiv.org Artificial IntelligenceOct-19-2023

We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a-priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.

lemma 3, proposition 2, proposition 3, (16 more...)

2211.00617

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Deora, Puneesh, Ghaderi, Rouzbeh, Taheri, Hossein, Thrampoulidis, Christos

On the Optimization and Generalization of Multi-head Attention

arXiv.org Machine LearningOct-19-2023

The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attention heads. Towards this goal, we derive convergence and generalization guarantees for gradient-descent training of a single-layer multi-head self-attention model, under a suitable realizability condition on the data. We then establish primitive conditions on the initialization that ensure realizability holds. Finally, we demonstrate that these conditions are satisfied for a simple tokenized-mixture model. We expect the analysis can be extended to various data-model and architecture variations.

artificial intelligence, machine learning, natural language, (19 more...)

2310.1268

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
North America > Canada > British Columbia (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

arXiv.org Machine LearningOct-19-2023

STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models

Karimi, Belhal, Xie, Jianwen, Li, Ping

We propose in this paper, STANLEY, a STochastic gradient ANisotropic LangEvin dYnamics, for sampling high dimensional data. With the growing efficacy and potential of Energy-Based modeling, also known as non-normalized probabilistic modeling, for modeling a generative process of different natures of high dimensional data observations, we present an end-to-end learning algorithm for Energy-Based models (EBM) with the purpose of improving the quality of the resulting sampled data points. While the unknown normalizing constant of EBMs makes the training procedure intractable, resorting to Markov Chain Monte Carlo (MCMC) is in general a viable option. Realizing what MCMC entails for the EBM training, we propose in this paper, a novel high dimensional sampling method, based on an anisotropic stepsize and a gradient-informed covariance matrix, embedded into a discretized Langevin diffusion. We motivate the necessity for an anisotropic update of the negative samples in the Markov Chain by the nonlinearity of the backbone of the EBM, here a Convolutional Neural Network. Our resulting method, namely STANLEY, is an optimization algorithm for training Energy-Based models via our newly introduced MCMC method. We provide a theoretical understanding of our sampling scheme by proving that the sampler leads to a geometrically uniformly ergodic Markov Chain. Several image generation experiments are provided in our paper to show the effectiveness of our method.

artificial intelligence, machine learning, proceedings, (16 more...)

2310.12667

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Ontario > Toronto (0.14)
(28 more...)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Machine LearningOct-18-2023

Online Estimation with Rolling Validation: Adaptive Nonparametric Estimation with Stream Data

Zhang, Tianyu, Lei, Jing

Online nonparametric estimators are gaining popularity due to their efficient computation and competitive generalization abilities. An important example includes variants of stochastic gradient descent. These algorithms often take one sample point at a time and instantly update the parameter estimate of interest. In this work we consider model selection and hyperparameter tuning for such online algorithms. We propose a weighted rolling-validation procedure, an online variant of leave-one-out cross-validation, that costs minimal extra computation for many typical stochastic gradient descent estimators. Similar to batch cross-validation, it can boost base estimators to achieve a better, adaptive convergence rate. Our theoretical analysis is straightforward, relying mainly on some general statistical stability assumptions. The simulation study underscores the significance of diverging weights in rolling validation in practice and demonstrates its sensitivity even when there is only a slim difference between candidate estimators.

artificial intelligence, estimator, machine learning, (17 more...)

2310.1214

Country: Europe > Germany > Lower Saxony > Gottingen (0.04)

Genre:

Overview (0.67)
Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Ozgul, Guneykan, Li, Xiantao, Mahdavi, Mehrdad, Wang, Chunhao

Stochastic Quantum Sampling for Non-Logconcave Distributions and Estimating Partition Functions

arXiv.org Artificial IntelligenceOct-17-2023

We present quantum algorithms for sampling from non-logconcave probability distributions in the form of $\pi(x) \propto \exp(-\beta f(x))$. Here, $f$ can be written as a finite sum $f(x):= \frac{1}{N}\sum_{k=1}^N f_k(x)$. Our approach is based on quantum simulated annealing on slowly varying Markov chains derived from unadjusted Langevin algorithms, removing the necessity for function evaluations which can be computationally expensive for large data sets in mixture modeling and multi-stable systems. We also incorporate a stochastic gradient oracle that implements the quantum walk operators inexactly by only using mini-batch gradients. As a result, our stochastic gradient based algorithm only accesses small subsets of data points in implementing the quantum walk. One challenge of quantizing the resulting Markov chains is that they do not satisfy the detailed balance condition in general. Consequently, the mixing time of the algorithm cannot be expressed in terms of the spectral gap of the transition density, making the quantum algorithms nontrivial to analyze. To overcome these challenges, we first build a hypothetical Markov chain that is reversible, and also converges to the target distribution. Then, we quantified the distance between our algorithm's output and the target distribution by using this hypothetical chain as a bridge to establish the total complexity. Our quantum algorithms exhibit polynomial speedups in terms of both dimension and precision dependencies when compared to the best-known classical algorithms.

non-logconcave distribution, partition function, stochastic quantum sampling

2310.11445

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.73)

arXiv.org Artificial IntelligenceOct-17-2023

FedLAP-DP: Federated Learning by Sharing Differentially Private Loss Approximations

Wang, Hui-Po, Chen, Dingfan, Kerkouche, Raouf, Fritz, Mario

This work proposes FedLAP-DP, a novel privacy-preserving approach for federated learning. Unlike previous linear point-wise gradient-sharing schemes, such as FedAvg, our formulation enables a type of global optimization by leveraging synthetic samples received from clients. We additionally introduce an approach to measure effective approximation regions reflecting the quality of the approximation. Therefore, the server can recover an approximation of the global loss landscape and optimize the model globally. Moreover, motivated by the emerging privacy concerns, we demonstrate that our approach seamlessly works with record-level differential privacy (DP), granting theoretical privacy guarantees for every data record on the clients. Extensive results validate the efficacy of our formulation on various datasets with highly skewed distributions. Our method consistently improves over the baselines, especially considering highly skewed distributions and noisy gradients due to DP. The source code and setup will be released upon publication. Federated Learning (FL) (McMahan et al., 2017) is a distributed learning framework that allows participants to train a model collaboratively without sharing their data. Predominantly, existing works (McMahan et al., 2017; Karimireddy et al., 2020; Li et al., 2020) achieve this by training local models on clients' private datasets and sharing only the gradients with the central server. Despite extensive research over the past few years, these prevalent gradient-based methods still suffer from several challenges (Kairouz et al., 2021), such as data heterogeneity, potential risks of privacy breaches, and high communication costs.

federated learning, gradient, optimization, (14 more...)

2302.01068

Country:

North America > United States > Virginia (0.04)
Europe > Switzerland > Geneva > Geneva (0.04)
Europe > Germany (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Machine LearningOct-17-2023

Resampling Stochastic Gradient Descent Cheaply for Efficient Uncertainty Quantification

Lam, Henry, Wang, Zitong

Stochastic gradient descent (SGD) or stochastic approximation has been widely used in model training and stochastic optimization. While there is a huge literature on analyzing its convergence, inference on the obtained solutions from SGD has only been recently studied, yet is important due to the growing need for uncertainty quantification. We investigate two computationally cheap resampling-based methods to construct confidence intervals for SGD solutions. One uses multiple, but few, SGDs in parallel via resampling with replacement from the data, and another operates this in an online fashion. Our methods can be regarded as enhancements of established bootstrap schemes to substantially reduce the computation effort in terms of resampling requirements, while at the same time bypassing the intricate mixing conditions in existing batching methods. We achieve these via a recent so-called cheap bootstrap idea and Berry-Esseen-type bound for SGD.

artificial intelligence, confidence interval, machine learning, (16 more...)

2310.11065

Country:

North America (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

arXiv.org Artificial IntelligenceOct-16-2023

Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching

Suh, H. J. Terry, Chou, Glen, Dai, Hongkai, Yang, Lujie, Gupta, Abhishek, Tedrake, Russ

Gradient-based methods enable efficient search capabilities in high dimensions. However, in order to apply them effectively in offline optimization paradigms such as offline Reinforcement Learning (RL) or Imitation Learning (IL), we require a more careful consideration of how uncertainty estimation interplays with first-order methods that attempt to minimize them. We study smoothed distance to data as an uncertainty metric, and claim that it has two beneficial properties: (i) it allows gradient-based methods that attempt to minimize uncertainty to drive iterates to data as smoothing is annealed, and (ii) it facilitates analysis of model bias with Lipschitz constants. As distance to data can be expensive to compute online, we consider settings where we need amortize this computation. Instead of learning the distance however, we propose to learn its gradients directly as an oracle for first-order optimizers. We show these gradients can be efficiently learned with score-matching techniques by leveraging the equivalence between distance to data and data likelihood. Using this insight, we propose Score-Guided Planning (SGP), a planning algorithm for offline RL that utilizes score-matching to enable first-order planning in high-dimensional problems, where zeroth-order methods were unable to scale, and ensembles were unable to overcome local minima. Website: https://sites.google.com/view/score-guided-planning/home

ensemble, optimization, trajectory, (15 more...)

2306.14079

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(8 more...)

Genre:

Research Report (1.00)
Instructional Material (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Leluc, Rémi, Portier, François

Asymptotic Analysis of Conditioned Stochastic Gradient Descent

arXiv.org Artificial IntelligenceOct-15-2023

In this paper, we investigate a general class of stochastic gradient descent (SGD) algorithms, called Conditioned SGD, based on a preconditioning of the gradient direction. Using a discrete-time approach with martingale tools, we establish under mild assumptions the weak convergence of the rescaled sequence of iterates for a broad class of conditioning matrices including stochastic first-order and second-order methods. Almost sure convergence results, which may be of independent interest, are also presented. Interestingly, the asymptotic normality result consists in a stochastic equicontinuity property so when the conditioning matrix is an estimate of the inverse Hessian, the algorithm is asymptotically optimal.

algorithm, convergence, machine learning research, (15 more...)

2006.02745

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)