AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

The Power of Normalization: Faster Evasion of Saddle Points

arXiv.org Machine LearningNov-15-2016

A commonly used heuristic in non-convex optimization is Normalized Gradient Descent (NGD) - a variant of gradient descent in which only the direction of the gradient is taken into account and its magnitude ignored. We analyze this heuristic and show that with carefully chosen parameters and noise injection, this method can provably evade saddle points. We establish the convergence of NGD to a local minimum, and demonstrate rates which improve upon the fastest known first order algorithm due to Ge e al. (2015). The effectiveness of our method is demonstrated via an application to the problem of online tensor decomposition; a task for which saddle point evasion is known to result in convergence to global minima.

artificial intelligence, equation, machine learning, (16 more...)

arXiv.org Machine Learning

1611.04831

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Which is your favorite Machine Learning Algorithm?

#artificialintelligenceNov-14-2016, 09:35:52 GMT

Developed back in the 50s by Rosenblatt and colleagues, this extremely simple algorithm can be viewed as the foundation for some of the most successful classifiers today, including suport vector machines and logistic regression, solved using stochastic gradient descent. The convergence proof for the Perceptron algorithm is one of the most elegant pieces of math I've seen in ML. Most useful: Boosting, especially boosted decision trees. This intuitive approach allows you to build highly accurate ML models, by combining many simple ones. Boosting is one of the most practical methods in ML, it's widely used in industry, can handle a wide variety of data types, and can be implemented at scale.

algorithm, artificial intelligence, machine learning, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.56)

Add feedback

Gradient Descent For Machine Learning - Machine Learning Mastery

#artificialintelligenceNov-14-2016, 07:50:36 GMT

In this post you discovered gradient descent for machine learning.

algorithm, artificial intelligence, machine learning, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Benchmarking Quantum Hardware for Training of Fully Visible Boltzmann Machines

Korenkevych, Dmytro, Xue, Yanbo, Bian, Zhengbing, Chudak, Fabian, Macready, William G., Rolfe, Jason, Andriyash, Evgeny

arXiv.org Machine LearningNov-14-2016

Quantum annealing (QA) is a hardware-based heuristic optimization and sampling method applicable to discrete undirected graphical models. While similar to simulated annealing, QA relies on quantum, rather than thermal, effects to explore complex search spaces. For many classes of problems, QA is known to offer computational advantages over simulated annealing. Here we report on the ability of recent QA hardware to accelerate training of fully visible Boltzmann machines. We characterize the sampling distribution of QA hardware, and show that in many cases, the quantum distributions differ significantly from classical Boltzmann distributions. In spite of this difference, training (which seeks to match data and model statistics) using standard classical gradient updates is still effective. We investigate the use of QA for seeding Markov chains as an alternative to contrastive divergence (CD) and persistent contrastive divergence (PCD). Using $k=50$ Gibbs steps, we show that for problems with high-energy barriers between modes, QA-based seeds can improve upon chains with CD and PCD initializations. For these hard problems, QA gradient estimates are more accurate, and allow for faster learning. Furthermore, and interestingly, even the case of raw QA samples (that is, $k=0$) achieved similar improvements. We argue that this relates to the fact that we are training a quantum rather than classical Boltzmann distribution in this case. The learned parameters give rise to hardware QA distributions closely approximating classical Boltzmann distributions that are hard to train with CD/PCD.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Machine Learning

1611.04528

Country:

North America > Canada (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Practical Secure Aggregation for Federated Learning on User-Held Data

Bonawitz, Keith, Ivanov, Vladimir, Kreuter, Ben, Marcedone, Antonio, McMahan, H. Brendan, Patel, Sarvar, Ramage, Daniel, Segal, Aaron, Seth, Karn

arXiv.org Machine LearningNov-14-2016

Secure Aggregation protocols allow a collection of mutually distrust parties, each holding a private value, to collaboratively compute the sum of those values without revealing the values themselves. We consider training a deep neural network in the Federated Learning model, using distributed stochastic gradient descent across user-held training data on mobile devices, wherein Secure Aggregation protects each user's model gradient. We design a novel, communication-efficient Secure Aggregation protocol for high-dimensional data that tolerates up to 1/3 users failing to complete the protocol. For 16-bit input values, our protocol offers 1.73x communication expansion for $2^{10}$ users and $2^{20}$-dimensional vectors, and 1.98x expansion for $2^{14}$ users and $2^{24}$ dimensional vectors.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

1611.04482

Country: North America > United States (0.68)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

An overview of gradient descent optimization algorithms

@machinelearnbotNov-9-2016, 06:20:19 GMT

Gradient descent is one of the most popular algorithms to perform optimization and by far the most common way to optimize neural networks. At the same time, every state-of-the-art Deep Learning library contains implementations of various algorithms to optimize gradient descent (e.g. These algorithms, however, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This blog post aims at providing you with intuitions towards the behaviour of different algorithms for optimizing gradient descent that will help you put them to use. We are first going to look at the different variants of gradient descent.

algorithm, artificial intelligence, machine learning, (6 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

IAB Reveals Winners of Data Rockstar Awards

#artificialintelligenceNov-8-2016, 19:00:18 GMT

IAB (Interactive Advertising Bureau) and its Data Center of Excellence today announced the winners of the inaugural IAB Data Rockstar Awards, celebrating top industry leaders and practitioners who have demonstrated achievement in data science or technology. The top finalists were selected by the IAB Data Center of Excellence Board of Directors and were evaluated based on demonstrated excellence, creativity or forward-thinking approaches to solving problems in data science, as well as the impact their contributions have made to their company or industry. Chalasani developed a highly efficient, distributed, extreme-scale, single-pass online logistic regression learning system in Scala/Spark, using variants of Stochastic Gradient Descent, capable of handling hundreds of millions of sparse features and billions of training observations. His system incorporates a number of state-of-the-art techniques that do not exist together in any other machine learning system, including adaptive feature-scaling, adaptive gradients, feature-interactions and feature-hashing. Chalasani work is central to MediaMath's vision for every addressable interaction between a marketer and a consumer to be driven by Machine Learning optimization against all available, relevant data at that moment, to maximize long-term marketer business outcomes.

artificial intelligence, data rockstar award, machine learning, (17 more...)

#artificialintelligence

Country:

North America > United States > New York (0.06)
North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > California > San Francisco County > San Francisco (0.05)

Genre:

Research Report (0.92)
Personal > Honors > Award (0.52)

Industry: Information Technology > Services (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

tdeboissiere/DeepLearningImplementations

@machinelearnbotNov-8-2016, 16:35:10 GMT

This is a keras implementation of Improving Stochastic Gradient Descent With Feedback. Check this page for the authors' original implementation of Eve. Or copy the Eve class to keras/optimizers.py and use it as any other optimizer.

artificial intelligence, machine learning, tdeboissiere deeplearningimplementation, (1 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.77)

Add feedback

Recursive Decomposition for Nonconvex Optimization

Friesen, Abram L., Domingos, Pedro

arXiv.org Machine LearningNov-8-2016

Continuous optimization is an important problem in many areas of AI, including vision, robotics, probabilistic inference, and machine learning. Unfortunately, most real-world optimization problems are nonconvex, causing standard convex techniques to find only local optima, even with extensions like random restarts and simulated annealing. We observe that, in many cases, the local modes of the objective function have combinatorial structure, and thus ideas from combinatorial optimization can be brought to bear. Based on this, we propose a problem-decomposition approach to nonconvex optimization. Similarly to DPLL-style SAT solvers and recursive conditioning in probabilistic inference, our algorithm, RDIS, recursively sets variables so as to simplify and decompose the objective function into approximately independent sub-functions, until the remaining functions are simple enough to be optimized by standard techniques like gradient descent. The variables to set are chosen by graph partitioning, ensuring decomposition whenever possible. We show analytically that RDIS can solve a broad class of nonconvex optimization problems exponentially faster than gradient descent with random restarts. Experimentally, RDIS outperforms standard techniques on problems like structure from motion and protein folding.

artificial intelligence, machine learning, rdis, (18 more...)

arXiv.org Machine Learning

1611.02755

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Add feedback

SGD with Variance Reduction beyond Empirical Risk Minimization

Achab, Massil, Guilloux, Agathe, Gaïffas, Stéphane, Bacry, Emmanuel

arXiv.org Machine LearningNov-8-2016

We introduce a doubly stochastic proximal gradient algorithm for optimizing a finite average of smooth convex functions, whose gradients depend on numerically expensive expectations. Our main motivation is the acceleration of the optimization of the regularized Cox partial-likelihood (the core model used in survival analysis), but our algorithm can be used in different settings as well. The proposed algorithm is doubly stochastic in the sense that gradient steps are done using stochastic gradient descent (SGD) with variance reduction, where the inner expectations are approximated by a Monte-Carlo Markov-Chain (MCMC) algorithm. We derive conditions on the MCMC number of iterations guaranteeing convergence, and obtain a linear rate of convergence under strong convexity and a sublinear rate without this assumption. We illustrate the fact that our algorithm improves the state-of-the-art solver for regularized Cox partial-likelihood on several datasets from survival analysis.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1510.04822

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback