AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time

Wang, Weichen, Han, Jiequn, Yang, Zhuoran, Wang, Zhaoran

arXiv.org Machine LearningAug-16-2020

Reinforcement learning is a powerful tool to learn the optimal policy of possibly multiple agents by interacting with the environment. As the number of agents grow to be very large, the system can be approximated by a mean-field problem. Therefore, it has motivated new research directions for mean-field control (MFC) and mean-field game (MFG). In this paper, we study the policy gradient method for the linear-quadratic mean-field control and game, where we assume each agent has identical linear state transitions and quadratic cost functions. While most of the recent works on policy gradient for MFC and MFG are based on discrete-time models, we focus on the continuous-time models where some analyzing techniques can be interesting to the readers. For both MFC and MFG, we provide policy gradient update and show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation. For MFG, we also provide sufficient conditions for the existence and uniqueness of the Nash equilibrium.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2008.06845

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > China (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Byzantine-Resilient High-Dimensional Federated Learning

Data, Deepesh, Diggavi, Suhas

arXiv.org Machine LearningAug-16-2020

We study stochastic gradient descent (SGD) with local iterations in the presence of malicious/Byzantine clients, motivated by the federated learning. The clients, instead of communicating with the central server in every iteration, maintain their local models, which they update by taking several SGD iterations based on their own datasets and then communicate the net update with the server, thereby achieving communication-efficiency. Furthermore, only a subset of clients communicate with the server, and this subset may be different at different synchronization times. The Byzantine clients may collaborate and send arbitrary vectors to the server to disrupt the learning process. To combat the adversary, we employ an efficient high-dimensional robust mean estimation algorithm from Steinhardt et al.~\cite[ITCS 2018]{Resilience_SCV18} at the server to filter-out corrupt vectors; and to analyze the outlier-filtering procedure, we develop a novel matrix concentration result that may be of independent interest. We provide convergence analyses for strongly-convex and non-convex smooth objectives in the heterogeneous data setting, where different clients may have different local datasets, and we do not make any probabilistic assumptions on data generation. We believe that ours is the first Byzantine-resilient algorithm and analysis with local iterations. We derive our convergence results under minimal assumptions of bounded variance for SGD and bounded gradient dissimilarity (which captures heterogeneity among local datasets). We also extend our results to the case when clients compute full-batch gradients.

artificial intelligence, gradient, machine learning, (18 more...)

arXiv.org Machine Learning

2006.13041

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Virginia (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

StatAssist & GradBoost: A Study on Optimal INT8 Quantization-aware Training from Scratch

Kim, Taehoon, Yoo, Youngjoon, Yang, Jihoon

arXiv.org Machine LearningAug-16-2020

This paper studies the scratch training of quantization-aware training (QAT), which has been applied to the lossless conversion of lower-bit, especially for INT8 quantization. Due to its training instability, QAT have required a full-precision (FP) pre-trained weight for fine-tuning and the performance is bound to the original FP model with floating-point computations. Here, we propose critical but straightforward optimization methods which enable the scratch training: floating-point statistic assisting (StatAssist) and stochastic-gradient boosting (GradBoost). We discovered that, first, the scratch QAT get comparable and often surpasses the performance of the floating-point counterpart without any help of the pre-trained model, especially when the model becomes complicated.We also show that our method can even train the minimax generation loss, which is very unstable and hence difficult to apply QAT fine-tuning. From extent experiments, we show that our method successfully enables QAT to train various deep models from scratch: classification, object detection, semantic segmentation, and style transfer, with comparable or often better performance than their FP baselines.

artificial intelligence, frostconv, machine learning, (20 more...)

arXiv.org Machine Learning

2006.09679

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

New Ways for Optimizing Gradient Descent

#artificialintelligenceAug-15-2020, 11:45:17 GMT

The new era of machine learning and artificial intelligence is the Deep learning era. It not only has immeasurable accuracy but also a huge hunger for data. Employing neural nets, functions with more exceeding complexity can be mapped on given data points. But there are a few very precise things which make the experience with neural networks more incredible and perceiving. Let us assume that we have trained a huge neural network.

artificial intelligence, machine learning, neural network, (6 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.45)

Add feedback

Obtaining Adjustable Regularization for Free via Iterate Averaging

Wu, Jingfeng, Braverman, Vladimir, Yang, Lin F.

arXiv.org Machine LearningAug-15-2020

Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a single round of training takes significant amount of time. Very recently, Neu and Rosasco show that if we run stochastic gradient descent (SGD) on linear regression problems, then by averaging the SGD iterates properly, we obtain a regularized solution. It left open whether the same phenomenon can be achieved for other optimization problems and algorithms. In this paper, we establish an averaging scheme that provably converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart with an adjustable regularization parameter. Our approaches can be used for accelerated and preconditioned optimization methods as well. We further show that the same methods work empirically on more general optimization objectives including neural networks. In sum, we obtain adjustable regularization for free for a large class of optimization problems and resolve an open question raised by Neu and Rosasco.

artificial intelligence, machine learning, obtaining adjustable regularization, (15 more...)

arXiv.org Machine Learning

2008.06736

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > New York (0.04)
North America > United States > Maryland > Baltimore (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Three Variants of Differential Privacy: Lossless Conversion and Applications

Asoodeh, Shahab, Liao, Jiachun, Calmon, Flavio P., Kosut, Oliver, Sankar, Lalitha

arXiv.org Artificial IntelligenceAug-14-2020

We consider three different variants of differential privacy (DP), namely approximate DP, R\'enyi DP (RDP), and hypothesis test DP. In the first part, we develop a machinery for optimally relating approximate DP to RDP based on the joint range of two $f$-divergences that underlie the approximate DP and RDP. In particular, this enables us to derive the optimal approximate DP parameters of a mechanism that satisfies a given level of RDP. As an application, we apply our result to the moments accountant framework for characterizing privacy guarantees of noisy stochastic gradient descent (SGD). When compared to the state-of-the-art, our bounds may lead to about 100 more stochastic gradient descent iterations for training deep learning models for the same privacy budget. In the second part, we establish a relationship between RDP and hypothesis test DP which allows us to translate the RDP constraint into a tradeoff between type I and type II error probabilities of a certain binary hypothesis test. We then demonstrate that for noisy SGD our result leads to tighter privacy guarantees compared to the recently proposed $f$-DP framework for some range of parameters.

artificial intelligence, machine learning, mechanism, (17 more...)

arXiv.org Artificial Intelligence

2008.06529

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(5 more...)

Genre: Research Report > New Finding (0.54)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Dimension Independence in Unconstrained Private ERM via Adaptive Preconditioning

Kairouz, Peter, Ribero, Mónica, Rush, Keith, Thakurta, Abhradeep

arXiv.org Machine LearningAug-14-2020

In this paper we revisit the problem of private empirical risk minimziation (ERM) with differential privacy. We show that for unconstrained convex empirical risk minimization if the observed gradients of the objective function along the path of private gradient descent lie in a low-dimensional subspace (smaller than the ambient dimensionality of $p$), then using noisy adaptive preconditioning (a.k.a., noisy Adaptive Gradient Descent (AdaGrad)) we obtain a regret composed of two terms: a constant multiplicative factor of the original AdaGrad regret and an additional regret due to noise. In particular, we show that if the gradients lie in a constant rank subspace, then one can achieve an excess empirical risk of $ \tilde{O}(1/\epsilon n)$, compared to the worst-case achievable bound of $\tilde{O}(\sqrt{p}/\epsilon n)$. While previous works show dimension independent excess empirical risk bounds for the restrictive setting of convex generalized linear problems optimized over unconstrained subspaces, our results operate with general convex functions in unconstrained minimization. Along the way, we do a perturbation analysis of noisy AdaGrad, which may be of independent interest.

artificial intelligence, lemma 3, machine learning, (17 more...)

arXiv.org Machine Learning

2008.0657

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data

Gu, Bin, Dang, Zhiyuan, Li, Xiang, Huang, Heng

arXiv.org Machine LearningAug-14-2020

In a lot of real-world data mining and machine learning applications, data are provided by multiple providers and each maintains private records of different feature sets about common entities. It is challenging to train these vertically partitioned data effectively and efficiently while keeping data privacy for traditional data mining and machine learning algorithms. In this paper, we focus on nonlinear learning with kernels, and propose a federated doubly stochastic kernel learning (FDSKL) algorithm for vertically partitioned data. Specifically, we use random features to approximate the kernel mapping function and use doubly stochastic gradients to update the solutions, which are all computed federatedly without the disclosure of data. Importantly, we prove that FDSKL has a sublinear convergence rate, and can guarantee the data security under the semi-honest assumption. Extensive experimental results on a variety of benchmark datasets show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels, while retaining the similar generalization performance.

artificial intelligence, fdskl, machine learning, (14 more...)

arXiv.org Machine Learning

2008.06197

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario (0.04)
Asia > China (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Bayesian Neural Network via Stochastic Gradient Descent

Sagar, Abhinav

arXiv.org Machine LearningAug-12-2020

The goal of bayesian approach used in variational inference is to minimize the KL divergence between variational distribution and unknown posterior distribution. This is done by maximizing the Evidence Lower Bound (ELBO). A neural network is used to parametrize these distributions using Stochastic Gradient Descent. This work extends the work done by others by deriving the variational inference models. We show how SGD can be applied on bayesian neural networks by gradient estimation techniques. For validation, we have tested our model on 5 UCI datasets and the metrics chosen for evaluation are Root Mean Square Error (RMSE) error and negative log likelihood. Our work considerably beats the previous state of the art approaches for regression using bayesian neural networks.

artificial intelligence, machine learning, variational inference, (7 more...)

arXiv.org Machine Learning

2006.08453

Country: Asia > India > Tamil Nadu > Vellore (0.04)

Genre: Research Report > Promising Solution (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Approximation and convergence of GANs training: an SDE approach

Cao, Haoyang, Guo, Xin

arXiv.org Machine LearningAug-12-2020

Generative adversarial networks (GANs) have enjoyed tremendous empirical successes, and research interest in the theoretical understanding of GANs training process is rapidly growing, especially for its evolution and convergence analysis. This paper establishes approximations, with precise error bound analysis, for the training of GANs under stochastic gradient algorithms (SGAs). The approximations are in the form of coupled stochastic differential equations (SDEs). The analysis of the SDEs and the associated invariant measures yields conditions for the convergence of GANs training. Further analysis of the invariant measure for the coupled SDEs gives rise to a fluctuation-dissipation relations (FDRs) for GANs, revealing the trade-off of the loss landscape between the generator and the discriminator and providing guidance for learning rate scheduling.

artificial intelligence, gan training, machine learning, (14 more...)

arXiv.org Machine Learning

2006.02047

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback