AITopics | l-bfg method

In our proposed methods, we approximate the Hessian by a block-diagonal matrix and use the structure of the gradient and Hessian to further approximate these blocks, each of which corresponds to a layer, as the Kronecker product of two much smaller matrices.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

A Multi-Batch L-BFGS Method for Machine Learning

Neural Information Processing SystemsMar-12-2024, 14:29:50 GMT

The question of how to parallelize the stochastic gradient descent (SGD) method has received much attention in the literature. In this paper, we focus instead on batch methods that use a sizeable fraction of the training set at each iteration to facilitate parallelism, and that employ second-order information. In order to improve the learning process, we follow a multi-batch approach in which the batch changes at each iteration. This can cause difficulties because L-BFGS employs gradient differences to update the Hessian approximations, and when these gradients are computed using different data points the process can be unstable. This paper shows how to perform stable quasi-Newton updating in the multi-batch setting, illustrates the behavior of the algorithm in a distributed computing platform, and studies its convergence properties for both the convex and nonconvex cases.

artificial intelligence, iteration, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Evanston (0.04)
North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

A Regularized Limited Memory BFGS method for Large-Scale Unconstrained Optimization and its Efficient Implementations

Tankaria, Hardik, Sugimoto, Shinji, Yamashita, Nobuo

arXiv.org Machine LearningJan-12-2021

The limited memory BFGS (L-BFGS) method is one of the popular methods for solving large-scale unconstrained optimization. Since the standard L-BFGS method uses a line search to guarantee its global convergence, it sometimes requires a large number of function evaluations. To overcome the difficulty, we propose a new L-BFGS with a certain regularization technique. We show its global convergence under the usual assumptions. In order to make the method more robust and efficient, we also extend it with several techniques such as nonmonotone technique and simultaneous use of the Wolfe line search. Finally, we present some numerical results for test problems in CUTEst, which show that the proposed method is robust in terms of solving number of problems.

function evaluation, l-bfg method, line search, (14 more...)

arXiv.org Machine Learning

2101.04413

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Mathematics of Computing (0.68)

Add feedback

An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training

Zocco, Federico, McLoone, Seán

arXiv.org Artificial IntelligenceDec-14-2020

Abstract: Motivated by the potential for parallel implementation of batch-based algorithms and the accelerated convergence achievable with approximated second order information a limited memory version of the BFGS algorithm has been receiving increasing attention in recent years for large neural network training problems. As the shape of the cost function is generally not quadratic and only becomes approximately quadratic in the vicinity of a minimum, the use of second order information by L-BFGS can be unreliable during the initial phase of training, i.e. when far from a minimum. Therefore, to control the influence of second order information as training progresses, we propose a multi-batch L-BFGS algorithm, namely MB-AM, that gradually increases its trust in the curvature information by implementing a progressive storage and use of curvature data through a development-based increase (dev-increase) scheme. Using six discriminative modelling benchmark problems we show empirically that MB-AM has slightly faster convergence and, on average, achieves better solutions than the standard multi-batch L-BFGS algorithm when training MLP and CNN models. Keywords: Deep learning, L-BFGS, variable memory, quasi-Newton methods, neural networks 1. INTRODUCTION currently an active area of research due to the accelerated convergence achievable with curvature information and In the last twenty years significant advances have been the ability to exploit parallelism with large batch sizes made towards making artificial neural networks able to to achieve efficient algorithm implementations (Berahas compete with their biological counterparts (Dodge and et al.(2016); Yousefian et al.(2017)).

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.ifacol.2020.12.1996

2012.07434

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > Northern Ireland > County Down > Belfast (0.04)
Europe > United Kingdom > Northern Ireland > County Antrim > Belfast (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

A Multi-Batch L-BFGS Method for Machine Learning

Berahas, Albert S., Nocedal, Jorge, Takac, Martin

Neural Information Processing SystemsDec-31-2016

The question of how to parallelize the stochastic gradient descent (SGD) method has received much attention in the literature. In this paper, we focus instead on batch methods that use a sizeable fraction of the training set at each iteration to facilitate parallelism, and that employ second-order information. In order to improve the learning process, we follow a multi-batch approach in which the batch changes at each iteration. This can cause difficulties because L-BFGS employs gradient differences to update the Hessian approximations, and when these gradients are computed using different data points the process can be unstable. This paper shows how to perform stable quasi-Newton updating in the multi-batch setting, illustrates the behavior of the algorithm in a distributed computing platform, and studies its convergence properties for both the convex and nonconvex cases.

gradient, iteration, l-bfg method, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Evanston (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

A Multi-Batch L-BFGS Method for Machine Learning

Berahas, Albert S., Nocedal, Jorge, Takáč, Martin

arXiv.org Machine LearningOct-23-2016

The question of how to parallelize the stochastic gradient descent (SGD) method has received much attention in the literature. In this paper, we focus instead on batch methods that use a sizeable fraction of the training set at each iteration to facilitate parallelism, and that employ second-order information. In order to improve the learning process, we follow a multi-batch approach in which the batch changes at each iteration. This can cause difficulties because L-BFGS employs gradient differences to update the Hessian approximations, and when these gradients are computed using different data points the process can be unstable. This paper shows how to perform stable quasi-Newton updating in the multi-batch setting, illustrates the behavior of the algorithm in a distributed computing platform, and studies its convergence properties for both the convex and nonconvex cases.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Machine Learning

1605.06049

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

Filters

Collaborating Authors

l-bfg method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

192fc044e74dffea144f9ac5dc9f3395-Paper.pdf

A Multi-Batch L-BFGS Method for Machine Learning

Practical Quasi-Newton Methods for Training Deep Neural Networks

A Multi-Batch L-BFGS Method for Machine Learning

A Regularized Limited Memory BFGS method for Large-Scale Unconstrained Optimization and its Efficient Implementations

An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training

A Multi-Batch L-BFGS Method for Machine Learning

A Multi-Batch L-BFGS Method for Machine Learning