AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Implicit Regularization of Accelerated Methods in Hilbert Spaces

arXiv.org Machine LearningJun-18-2019

We study learning properties of accelerated gradient descent methods for linear least-squares in Hilbert spaces. We analyze the implicit regularization properties of Nesterov acceleration and a variant of heavy-ball in terms of corresponding learning error bounds. Our results show that acceleration can provides faster bias decay than gradient descent, but also suffers of a more unstable behavior. As a result acceleration cannot be in general expected to improve learning accuracy with respect to gradient descent, but rather to achieve the same accuracy with reduced computations. Our theoretical results are validated by numerical simulations. Our analysis is based on studying suitable polynomials induced by the accelerated dynamics and combining spectral techniques with concentration inequalities.

artificial intelligence, gradient descent, machine learning, (16 more...)

arXiv.org Machine Learning

1905.13

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.77)

Add feedback

Escaping from saddle points on Riemannian manifolds

Sun, Yue, Flammarion, Nicolas, Fazel, Maryam

arXiv.org Machine LearningJun-17-2019

We consider minimizing a nonconvex, smooth function $f$ on a Riemannian manifold $\mathcal{M}$. We show that a perturbed version of Riemannian gradient descent algorithm converges to a second-order stationary point (and hence is able to escape saddle points on the manifold). The rate of convergence depends as $1/\epsilon^2$ on the accuracy $\epsilon$, which matches a rate known only for unconstrained smooth minimization. The convergence rate depends polylogarithmically on the manifold dimension $d$, hence is almost dimension-free. The rate also has a polynomial dependence on the parameters describing the curvature of the manifold and the smoothness of the function. While the unconstrained problem (Euclidean setting) is well-studied, our result is the first to prove such a rate for nonconvex, manifold-constrained problems.

artificial intelligence, exp, machine learning, (17 more...)

arXiv.org Machine Learning

1906.07355

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

DeepView: View Synthesis with Learned Gradient Descent

#artificialintelligenceJun-16-2019, 23:44:11 GMT

DeepView, a new view synthesis method presented at CVPR 2019.

artificial intelligence, machine learning, social media, (3 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

A Survey of Optimization Methods from a Machine Learning Perspective

Sun, Shiliang, Cao, Zehui, Zhu, Han, Zhao, Jing

arXiv.org Machine LearningJun-16-2019

Machine learning develops rapidly, which has made many theoretical breakthroughs and is widely applied in various fields. Optimization, as an important part of machine learning, has attracted much attention of researchers. With the exponential growth of data amount and the increase of model complexity, optimization methods in machine learning face more and more challenges. A lot of work on solving optimization problems or improving optimization methods in machine learning has been proposed successively. The systematic retrospect and summary of the optimization methods from the perspective of machine learning are of great significance, which can offer guidance for both developments of optimization and machine learning research. In this paper, we first describe the optimization problems in machine learning. Then, we introduce the principles and progresses of commonly used optimization methods. Next, we summarize the applications and developments of optimization methods in some popular machine learning fields. Finally, we explore and give some challenges and open problems for the optimization in machine learning.

algorithm, gradient, optimization method, (13 more...)

arXiv.org Machine Learning

1906.06821

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
(6 more...)

Genre:

Research Report (0.81)
Instructional Material (0.67)

Industry:

Education (0.68)
Transportation (0.46)
Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.92)
(3 more...)

Add feedback

Learning to learn via Self-Critique

Antoniou, Antreas, Storkey, Amos

arXiv.org Machine LearningJun-16-2019

In few-shot learning, a machine learning system learns from a small set of labelled examples relating to a specific task, such that it can generalize to new examples of the same task. Given the limited availability of labelled examples in such tasks, we wish to make use of all the information we can. Usually a model learns task-specific information from a small training-set (support-set) to predict on an unlabelled validation set (target-set). The target-set contains additional task-specific information which is not utilized by existing few-shot learning methods. Making use of the target-set examples via transductive learning requires approaches beyond the current methods; at inference time, the target-set contains only unlabelled input data-points, and so discriminative learning cannot be used. In this paper, we propose a framework called Self-Critique and Adapt or SCA, which learns to learn a label-free loss function, parameterized as a neural network. A base-model learns on a support-set using existing methods (e.g. stochastic gradient descent combined with the cross-entropy loss), and then is updated for the incoming target-task using the learnt loss function. This label-free loss function is itself optimized such that the learnt model achieves higher generalization performance. Experiments demonstrate that SCA offers substantially reduced error-rates compared to baselines which only adapt on the support-set, and results in state of the art benchmark performance on Mini-ImageNet and Caltech-UCSD Birds 200.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

1905.10295

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reinforcement Learning Driven Heuristic Optimization

Cai, Qingpeng, Hang, Will, Mirhoseini, Azalia, Tucker, George, Wang, Jingtao, Wei, Wei

arXiv.org Machine LearningJun-15-2019

Heuristic algorithms such as simulated annealing, Concorde, and METIS are effective and widely used approaches to find solutions to combinatorial optimization problems. However, they are limited by the high sample complexity required to reach a reasonable solution from a cold-start. In this paper, we introduce a novel framework to generate better initial solutions for heuristic algorithms using reinforcement learning (RL), named RLHO. We augment the ability of heuristic algorithms to greedily improve upon an existing initial solution generated by RL, and demonstrate novel results where RL is able to leverage the performance of heuristics as a learning signal to generate better initialization. We apply this framework to Proximal Policy Optimization (PPO) and Simulated Annealing (SA). We conduct a series of experiments on the well-known NP-complete bin packing problem, and show that the RLHO method outperforms our baselines. We show that on the bin packing problem, RL can learn to help heuristics perform even better, allowing us to combine the best parts of both approaches.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Machine Learning

1906.06639

Country:

North America > United States > California > Santa Clara County > Mountain View (0.14)
North America > United States > Alaska > Anchorage Municipality > Anchorage (0.05)
Asia > China > Beijing > Beijing (0.05)
(4 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback

Robust or Private? Adversarial Training Makes Models More Vulnerable to Privacy Attacks

Mejia, Felipe A., Gamble, Paul, Hampel-Arias, Zigfried, Lomnitz, Michael, Lopatina, Nina, Tindall, Lucas, Barrios, Maria Alejandra

arXiv.org Machine LearningJun-14-2019

Adversarial training was introduced as a way to improve the robustness of deep learning models to adversarial attacks. This training method improves robustness against adversarial attacks, but increases the models vulnerability to privacy attacks. In this work we demonstrate how model inversion attacks, extracting training data directly from the model, previously thought to be intractable become feasible when attacking a robustly trained model. The input space for a traditionally trained model is dominated by adversarial examples - data points that strongly activate a certain class but lack semantic meaning - this makes it difficult to successfully conduct model inversion attacks. We demonstrate this effect using the CIFAR-10 dataset under three different model inversion attacks, a vanilla gradient descent method, gradient based method at different scales, and a generative adversarial network base attacks.

artificial intelligence, machine learning, model inversion attack, (16 more...)

arXiv.org Machine Learning

1906.06449

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Meta-heuristic for non-homogeneous peak density spaces and implementation on 2 real-world parameter learning/tuning applications

Moattari, Mojtaba, Roshandel, Emad, Kamyab, Shima, Azimifar, Zohreh

arXiv.org Artificial IntelligenceJun-13-2019

Observer effect in physics (/psychology) regards bias in measurement (/perception) due to the interference of instrument (/knowledge). Based on these concepts, a new meta-heuristic algorithm is proposed for controlling memory usage per localities without pursuing Tabu-like cut-off approaches. In this paper, first, variations of observer effect are explained in different branches of science from physics to psychology. Then, a metaheuristic algorithm is proposed based on observer effect concepts and the used metrics are explained. The derived optimizer performance has been compared between 1st, non-homogeneous-peaks-density functions, and 2nd, homogeneous-peaks-density functions to verify the algorithm outperformance in the 1st scheme. Finally, performance analysis of the novel algorithms is derived using two real-world engineering applications in Electroencephalogram feature learning and Distributed Generator parameter tuning, each of which having nonlinearity and complex multi-modal peaks distributions as its characteristics. Also, the effect of version improvement has been assessed. The performance analysis among other optimizers in the same context suggests that the proposed algorithm is useful both solely and in hybrid Gradient Descent settings where problem's search space is nonhomogeneous in terms of local peaks density.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1906.05516

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Iran > Fars Province > Shiraz (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Energy > Power Industry (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Kernel and Deep Regimes in Overparametrized Models

Woodworth, Blake, Gunasekar, Suriya, Lee, Jason, Soudry, Daniel, Srebro, Nathan

arXiv.org Machine LearningJun-13-2019

A recent line of work studies overparametrized neural networks in the ``kernel regime,'' i.e.~when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution. This stands in contrast to other studies which demonstrate how gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. Building on an observation by Chizat and Bach, we show how the scale of the initialization controls the transition between the ``kernel'' (aka lazy) and ``deep'' (aka active) regimes and affects generalization properties in multilayer homogeneous models. We provide a complete and detailed analysis for a simple two-layer model that already exhibits an interesting and meaningful transition between the kernel and deep regimes, and we demonstrate the transition for more complex matrix factorization models.

artificial intelligence, machine learning, regime, (15 more...)

arXiv.org Machine Learning

1906.05827

Country:

North America > United States > California (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Lyu, Kaifeng, Li, Jian

arXiv.org Machine LearningJun-13-2019

Recent works on implicit regularization have shown that gradient descent converges to the max-margin direction for logistic regression with one-layer or multi-layer linear networks. In this paper, we generalize this result to homogeneous neural networks, including fully-connected and convolutional neural networks with ReLU or LeakyReLU activations. In particular, we study the gradient flow (gradient descent with infinitesimal step size) optimizing the logistic loss or cross-entropy loss of any homogeneous model (possibly non-smooth), and show that if the training loss decreases below a certain threshold, then we can define a smoothed version of the normalized margin which increases over time. We also formulate a natural constrained optimization problem related to margin maximization, and prove that both the normalized margin and its smoothed version converge to the objective value at a KKT point of the optimization problem. Furthermore, we extend the above results to a large family of loss functions. We conduct several experiments to justify our theoretical finding on MNIST and CIFAR-10 datasets. For gradient descent with constant learning rate, we observe that the normalized margin indeed keeps increasing after the dataset is fitted, but the speed is very slow. However, if we schedule the learning rate more carefully, we can observe a more rapid growth of the normalized margin. Finally, as margin is closely related to robustness, we discuss potential benefits of training longer for improving the robustness of the model.

artificial intelligence, machine learning, normalized margin, (17 more...)

arXiv.org Machine Learning

1906.0589

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback