AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Deep Switch Networks for Generating Discrete Data and Language

arXiv.org Machine LearningMar-14-2019

Multilayer switch networks are proposed as artificial generators of high-dimensional discrete data (e.g., binary vectors, categorical data, natural language, network log files, and discrete-valued time series). Unlike deconvolution networks which generate continuous-valued data and which consist of upsampling filters and reverse pooling layers, multilayer switch networks are composed of adaptive switches which model conditional distributions of discrete random variables. An interpretable, statistical framework is introduced for training these nonlinear networks based on a maximum-likelihood objective function. To learn network parameters, stochastic gradient descent is applied to the objective. This direct optimization is stable until convergence, and does not involve back-propagation over separate encoder and decoder networks, or adversarial training of dueling networks. While training remains tractable for moderately sized networks, Markov-chain Monte Carlo (MCMC) approximations of gradients are derived for deep networks which contain latent variables. The statistical framework is evaluated on synthetic data, high-dimensional binary data of handwritten digits, and web-crawled natural language data. Aspects of the model's framework such as interpretability, computational complexity, and generalization ability are discussed.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1903.06135

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Richness of Deep Echo State Network Dynamics

Gallicchio, Claudio, Micheli, Alessio

arXiv.org Artificial IntelligenceMar-12-2019

Reservoir Computing (RC) is a popular methodology for the efficient design of Recurrent Neural Networks (RNNs). Recently, the advantages of the RC approach have been extended to the context of multi-layered RNNs, with the introduction of the Deep Echo State Network (DeepESN) model. In this paper, we study the quality of state dynamics in progressively higher layers of DeepESNs, using tools from the areas of information theory and numerical analysis. Our experimental results on RC benchmark datasets reveal the fundamental role played by the strength of inter-reservoir connections to increasingly enrich the representations developed in higher layers. Our analysis also gives interesting insights into the possibility of effective exploitation of training algorithms based on stochastic gradient descent in the RC field.

deep learning, state network, upstream oil & gas, (22 more...)

arXiv.org Artificial Intelligence

1903.05174

Country: Europe > Italy (0.14)

Genre: Research Report (0.83)

Industry: Energy > Oil & Gas > Upstream (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Theory III: Dynamics and Generalization in Deep Networks

Banburski, Andrzej, Liao, Qianli, Miranda, Brando, Rosasco, Lorenzo, Liang, Bob, Hidary, Jack, Poggio, Tomaso

arXiv.org Artificial IntelligenceMar-12-2019

We review recent observations on the dynamical systems induced by gradient descent methods used for training deep networks and summarize properties of the solutions they converge to. Recent results illuminate the absence of overfitting in the special case of linear networks for binary classification. They prove that minimization of loss functions such as the logistic, the cross-entropy and the exponential loss yields asymptotic convergence to the maximum margin solution for linearly separable datasets, independently of the initial conditions. Here we discuss the case of nonlinear DNNs near zero minima of the empirical loss, under exponential-type and square losses, for several variations of the basic gradient descent algorithm, including a new NMGD (norm minimizing gradient descent) version that converges to the minimum norm fixed points of the gradient descent iteration. Our main results are: 1) gradient descent algorithms with weight normalization constraint achieve generalization; 2) the fundamental reason for the effectiveness of existing weight normalization and batch normalization techniques is that they are approximate implementations of maximizing the margin under unit norm constraint; 3) without unit norm constraints some level of generalization can still be obtained for not-too-deep networks because the balance of the weights across different layers, if present at initialization, is maintained by the gradient flow. In the perspective of these theoretical results, we discuss experimental evidence around the apparent absence of overfitting, that is the observation that the expected classification error does not get worse when increasing the number of parameters. Our explanation focuses on the implicit normalization enforced by algorithms such as batch normalization. In particular, the control of the norm of the weights is related to Halpern iterations for minimum norm solutions.

artificial intelligence, machine learning, normalization, (17 more...)

arXiv.org Artificial Intelligence

1903.04991

Country: North America > United States > New York (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling

Peng, Xinyu, Li, Li, Wang, Fei-Yue

arXiv.org Machine LearningMar-11-2019

Machine learning, especially deep neural networks, has been rapidly developed in fields including computer vision, speech recognition and reinforcement learning. Although Mini-batch SGD is one of the most popular stochastic optimization methods in training deep networks, it shows a slow convergence rate due to the large noise in gradient approximation. In this paper, we attempt to remedy this problem by building more efficient batch selection method based on typicality sampling, which reduces the error of gradient estimation in conventional Minibatch SGD. We analyze the convergence rate of the resulting typical batch SGD algorithm and compare convergence properties between Minibatch SGD and the algorithm. Experimental results demonstrate that our batch selection scheme works well and more complex Minibatch SGD variants can benefit from the proposed batch selection strategy.

artificial intelligence, machine learning, minibatch sgd, (14 more...)

arXiv.org Machine Learning

1903.04192

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Gradient Descent based Optimization Algorithms for Deep Learning Models Training

Zhang, Jiawei

arXiv.org Artificial IntelligenceMar-11-2019

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to train. Nowadays, most of the deep learning model training still relies on the back propagation algorithm actually. In back propagation, the model variables will be updated iteratively until convergence with gradient descent based optimization algorithms. Besides the conventional vanilla gradient descent algorithm, many gradient descent variants have also been proposed in recent years to improve the learning performance, including Momentum, Adagrad, Adam, Gadam, etc., which will all be introduced in this paper respectively.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

1903.03614

Country: Europe > United Kingdom > England > Greater London > London (0.04)

Genre:

Research Report (0.40)
Instructional Material > Course Syllabus & Notes (0.30)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mean Field Analysis of Deep Neural Networks

Sirignano, Justin, Spiliopoulos, Konstantinos

arXiv.org Machine LearningMar-11-2019

We analyze multi-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorously establish the limiting behavior of the multilayer neural network output. The limit procedure is valid for any number of hidden layers and it naturally also describes the limiting behavior of the training loss. The ideas that we explore are to (a) sequentially take the limits of each hidden layer and (b) characterizing the evolution of parameters in terms of their initialization. The limit satisfies a system of integro-differential equations.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

1903.0444

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback

It's Only Natural: An Excessively Deep Dive Into Natural Gradient Optimization

#artificialintelligenceMar-10-2019, 15:43:40 GMT

I'm going to tell a story: one you've almost certainly heard before, but with a different emphasis than you're used to. To a first (order) approximation, all modern deep learning models are trained using gradient descent. At each step of gradient descent, your parameter values begin at some starting point, and you move them in the direction of greatest loss reduction. You do this by taking the derivative of your loss with respect to your whole vector of parameters, otherwise called the Jacobian. However, this is just the first derivative of your loss, and it doesn't tell you anything about curvature, or, how quickly your first derivative is changing.

artificial intelligence, deep learning, machine learning, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Learning Quantum Graphical Models using Constrained Gradient Descent on the Stiefel Manifold

Adhikary, Sandesh, Srinivasan, Siddarth, Boots, Byron

arXiv.org Machine LearningMar-8-2019

Quantum graphical models (QGMs) extend the classical framework for reasoning about uncertainty by incorporating the quantum mechanical view of probability. Prior work on QGMs has focused on hidden quantum Markov models (HQMMs), which can be formulated using quantum analogues of the sum rule and Bayes rule used in classical graphical models. Despite the focus on developing the QGM framework, there has been little progress in learning these models from data. The existing state-of-the-art approach randomly initializes parameters and iteratively finds unitary transformations that increase the likelihood of the data. While this algorithm demonstrated theoretical strengths of HQMMs over HMMs, it is slow and can only handle a small number of hidden states. In this paper, we tackle the learning problem by solving a constrained optimization problem on the Stiefel manifold using a well-known retraction-based algorithm. We demonstrate that this approach is not only faster and yields better solutions on several datasets, but also scales to larger models that were prohibitively slow to train via the earlier method.

artificial intelligence, kraus operator, machine learning, (12 more...)

arXiv.org Machine Learning

1903.0373

Country: North America > United States (0.46)

Genre: Research Report (0.70)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Introducing TensorFlow Privacy: Learning with Differential Privacy for Training Data

#artificialintelligenceMar-7-2019, 02:57:05 GMT

Today, we're excited to announce TensorFlow Privacy (GitHub), an open source library that makes it easier not only for developers to train machine-learning models with privacy, but also for researchers to advance the state of the art in machine learning with strong privacy guarantees. Modern machine learning is increasingly applied to create amazing new technologies and user experiences, many of which involve training machines to learn responsibly from sensitive data, such as personal photos or email. Ideally, the parameters of trained machine-learning models should encode general patterns rather than facts about specific training examples. To ensure this, and to give strong privacy guarantees when the training data is sensitive, it is possible to use techniques based on the theory of differential privacy. In particular, when training on users' data, those techniques offer strong mathematical guarantees that models do not learn or remember the details about any specific user.

artificial intelligence, machine learning, tensorflow privacy, (15 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Add feedback

On Transformations in Stochastic Gradient MCMC

Yokoi, Soma, Otsuka, Takuma, Sato, Issei

arXiv.org Machine LearningMar-7-2019

Stochastic gradient Langevin dynamics (SGLD) is a widely used sampler for the posterior inference with a large scale dataset. Although SGLD is designed for unbounded random variables, many practical models incorporate variables with boundaries such as non-negative ones or those in a finite interval. Existing modifications of SGLD for handling bounded random variables resort to heuristics without a formal guarantee of sampling from the true stationary distribution. In this paper, we reformulate the SGLD algorithm incorporating a deterministic transformation with rigorous theories. Our method transforms unbounded samples obtained by SGLD into the domain of interest. We demonstrate transformed SGLD in both artificial problem settings and real-world applications of Bayesian non-negative matrix factorization and binary neural networks.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1903.0275

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback