AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Keep it simple! How to understand Gradient Descent algorithm

@machinelearnbotMay-1-2017, 08:35:13 GMT

When I first started out learning about machine learning algorithms, it turned out to be quite a task to gain an intuition of what the algorithms are doing. Not just because it was difficult to understand all the mathematical theory and notations, but it was also plain boring. When I turned to online tutorials for answers, I could again only see equations or high level explanations without going through the detail in a majority of the cases. It was then that one of my data science colleagues introduced me to the concept of working out an algorithm in an excel sheet. And that worked wonders for me.

algorithm, artificial intelligence, machine learning, (13 more...)

@machinelearnbot

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)

Add feedback

Online Natural Gradient as a Kalman Filter

Ollivier, Yann

arXiv.org Machine LearningApr-27-2017

We establish a full relationship between Kalman filtering and Amari's natural gradient in statistical learning. Namely, using an online natural gradient descent on data log-likelihood to evaluate the parameter of a probabilistic model from a series of observations, is exactly equivalent to using an extended Kalman filter to estimate the parameter (assumed to have constant dynamics). In the recurrent (state space, non-i.i.d.) case, we prove that the joint Kalman filter over states and parameters is a natural gradient on top of real-time recurrent learning (RTRL), a classical algorithm to train recurrent models. This exact algebraic correspondence provides relevant settings for natural gradient hyperparameters such as learning rates or initialization and regularization of the Fisher information matrix. The Appendix contains a reminder on exponential families.

artificial intelligence, kalman filter, machine learning, (16 more...)

arXiv.org Machine Learning

1703.00209

Country: North America (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Add feedback

Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

Veeriah, Vivek, Zhang, Shangtong, Sutton, Richard S.

arXiv.org Machine LearningApr-27-2017

Representations are fundamental to artificial intelligence. The performance of a learning system depends on the type of representation used for representing the data. Typically, these representations are hand-engineered using domain knowledge. More recently, the trend is to learn these representations through stochastic gradient descent in multi-layer neural networks, which is called backprop. Learning the representations directly from the incoming data stream reduces the human labour involved in designing a learning system. More importantly, this allows in scaling of a learning system for difficult tasks. In this paper, we introduce a new incremental learning algorithm called crossprop, which learns incoming weights of hidden units based on the meta-gradient descent approach, that was previously introduced by Sutton (1992) and Schraudolph (1999) for learning step-sizes. The final update equation introduces an additional memory parameter for each of these weights and generalizes the backprop update equation. From our experiments, we show that crossprop learns and reuses its feature representation while tackling new and unseen tasks whereas backprop relearns a new feature representation.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Machine Learning

1612.02879

Country: North America > Canada > Alberta (0.28)

Genre: Research Report (0.64)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Accelerating Stochastic Gradient Descent

Jain, Prateek, Kakade, Sham M., Kidambi, Rahul, Netrapalli, Praneeth, Sidford, Aaron

arXiv.org Machine LearningApr-26-2017

There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014. This work considers these issues for the special case of stochastic approximation for the least squares regression problem, and our main result refutes the conventional wisdom by showing that acceleration can be made robust to statistical errors. In particular, this work introduces an accelerated stochastic gradient method that provably achieves the minimax optimal statistical risk faster than stochastic gradient descent. Critical to the analysis is a sharp characterization of accelerated stochastic gradient descent as a stochastic process. We hope this characterization gives insights towards the broader question of designing simple and effective accelerated stochastic methods for more general convex and non-convex optimization problems.

artificial intelligence, covariance, machine learning, (15 more...)

arXiv.org Machine Learning

1704.08227

Country: North America > United States > California (0.27)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Industry: Leisure & Entertainment > Sports (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Linear Convergence of Accelerated Stochastic Gradient Descent for Nonconvex Nonsmooth Optimization

Huang, Feihu, Chen, Songcan

arXiv.org Machine LearningApr-25-2017

In this paper, we study the stochastic gradient descent (SGD) method for the nonconvex nonsmooth optimization, and propose an accelerated SGD method by combining the variance reduction technique with Nesterov's extrapolation technique. Moreover, based on the local error bound condition, we establish the linear convergence of our method to obtain a stationary point of the nonconvex optimization. In particular, we prove that not only the sequence generated linearly converges to a stationary point of the problem, but also the corresponding sequence of objective values is linearly convergent. Finally, some numerical experiments demonstrate the effectiveness of our method. To the best of our knowledge, it is first proved that the accelerated SGD method converges linearly to the local minimum of the nonconvex optimization.

artificial intelligence, machine learning, sequence, (15 more...)

arXiv.org Machine Learning

1704.07953

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates

Li, Zhi, Shi, Wei, Yan, Ming

arXiv.org Machine LearningApr-25-2017

This paper considers the problem of decentralized optimization with a composite objective containing smooth and non-smooth terms. To solve the problem, a proximal-gradient scheme is studied. Specifically, the smooth and nonsmooth terms are dealt with by gradient update and proximal update, respectively. The studied algorithm is closely related to a previous decentralized optimization algorithm, PG-EXTRA [37], but has a few advantages. First of all, in our new scheme, agents use uncoordinated step-sizes and the stable upper bounds on step-sizes are independent from network topologies. The step-sizes depend on local objective functions, and they can be as large as that of the gradient descent. Secondly, for the special case without non-smooth terms, linear convergence can be achieved under the strong convexity assumption. The dependence of the convergence rate on the objective functions and the network are separated, and the convergence rate of our new scheme is as good as one of the two convergence rates that match the typical rates for the general gradient descent and the consensus averaging. We also provide some numerical experiments to demonstrate the efficacy of the introduced algorithms and validate our theoretical discoveries.

artificial intelligence, machine learning, optimization, (16 more...)

arXiv.org Machine Learning

1704.07807

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

A Neural Network model with Bidirectional Whitening

Fujimoto, Yuki, Ohira, Toru

arXiv.org Machine LearningApr-24-2017

We present here a new model and algorithm which performs an efficient Natural gradient descent for Multilayer Perceptrons. Natural gradient descent was originally proposed from a point of view of information geometry, and it performs the steepest descent updates on manifolds in a Riemannian space. In particular, we extend an approach taken by the "Whitened neural networks" model. We make the whitening process not only in feed-forward direction as in the original model, but also in the back-propagation phase. Its efficacy is shown by an application of this "Bidirectional whitened neural networks" model to a handwritten character recognition data (MNIST data).

artificial intelligence, machine learning, natural gradient descent, (14 more...)

arXiv.org Machine Learning

1704.07147

Country: Asia > Japan > Honshū > Chūbu > Aichi Prefecture > Nagoya (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

Add feedback

Perishability of Data: Dynamic Pricing under Varying-Coefficient Models

Javanmard, Adel

arXiv.org Machine LearningApr-24-2017

We consider a firm that sells a large number of products to its customers in an online fashion. Each product is described by a high dimensional feature vector, and the market value of a product is assumed to be linear in the values of its features. Parameters of the valuation model are unknown and can change over time. The firm sequentially observes a product's features and can use the historical sales data (binary sale/no sale feedbacks) to set the price of current product, with the objective of maximizing the collected revenue. We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. We propose a pricing policy based on projected stochastic gradient descent (PSGD) and characterize its regret in terms of time $T$, features dimension $d$, and the temporal variability in the model parameters, $\delta_t$. We consider two settings. In the first one, feature vectors are chosen antagonistically by nature and we prove that the regret of PSGD pricing policy is of order $O(\sqrt{T} + \sum_{t=1}^T \sqrt{t}\delta_t)$. In the second setting (referred to as stochastic features model), the feature vectors are drawn independently from an unknown distribution. We show that in this case, the regret of PSGD pricing policy is of order $O(d^2 \log T + \sum_{t=1}^T t\delta_t/d)$.

artificial intelligence, machine learning, pricing policy, (16 more...)

arXiv.org Machine Learning

1701.03537

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Marketing (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Finding Approximate Local Minima Faster than Gradient Descent

Agarwal, Naman, Allen-Zhu, Zeyuan, Bullins, Brian, Hazan, Elad, Ma, Tengyu

arXiv.org Machine LearningApr-24-2017

We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number of training examples. The time complexity of our algorithm to find an approximate local minimum is even faster than that of gradient descent to find a critical point. Our algorithm applies to a general class of optimization problems including training a neural network and other non-convex objectives arising in machine learning.

artificial intelligence, assumption, machine learning, (18 more...)

arXiv.org Machine Learning

1611.01146

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

Add feedback

Learn under the hood of Gradient Descent algorithm using excel

@machinelearnbotApr-21-2017, 17:59:28 GMT

algorithm, artificial intelligence, machine learning, (14 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)

Add feedback