Goto

Collaborating Authors

 l-bfg





Large-scale L-BFGS using MapReduce

Weizhu Chen, Zhenghao Wang, Jingren Zhou

Neural Information Processing Systems

L-BFGS has been applied as an effective parameter estimation method for various machine learning algorithms since 1980s. With an increasing demand to deal with massive instances and variables, it is important to scale up and parallelize L-BFGS effectively in a distributed system. In this paper, we study the problem of parallelizing the L-BFGS algorithm in large clusters of tens of thousands of shared-nothing commodity machines. First, we show that a naive implementation of L-BFGS using Map-Reduce requires either a significant amount of memory or a large number of map-reduce steps with negative performance impact. Second, we propose a new L-BFGS algorithm, called V ector-free L-BFGS, which avoids the expensive dot product operations in the two loop recursion and greatly improves computation efficiency with a great degree of parallelism. The algorithm scales very well and enables a variety of machine learning algorithms to handle a massive number of variables over large datasets. We prove the mathematical equivalence of the new V ector-free L-BFGS and demonstrate its excellent performance and scalability using real-world machine learning problems with billions of variables in production clusters.



Practical Quasi-Newton Methods for Training Deep Neural Networks

Neural Information Processing Systems

In our proposed methods, we approximate the Hessian by a block-diagonal matrix and use the structure of the gradient and Hessian to further approximate these blocks, each of which corresponds to a layer, as the Kronecker product of two much smaller matrices.



Large-scale L-BFGS using MapReduce

Neural Information Processing Systems

L-BFGS has been applied as an effective parameter estimation method for various machine learning algorithms since 1980s. With an increasing demand to deal with massive instances and variables, it is important to scale up and parallelize L-BFGS effectively in a distributed system. In this paper, we study the problem of parallelizing the L-BFGS algorithm in large clusters of tens of thousands of shared-nothing commodity machines. First, we show that a naive implementation of L-BFGS using Map-Reduce requires either a significant amount of memory or a large number of map-reduce steps with negative performance impact. Second, we propose a new L-BFGS algorithm, called Vector-free L-BFGS, which avoids the expensive dot product operations in the two loop recursion and greatly improves computation efficiency with a great degree of parallelism. The algorithm scales very well and enables a variety of machine learning algorithms to handle a massive number of variables over large datasets. We prove the mathematical equivalence of the new Vector-free L-BFGS and demonstrate its excellent performance and scalability using real-world machine learning problems with billions of variables in production clusters.


c4ede56bbd98819ae6112b20ac6bf145-AuthorFeedback.pdf

Neural Information Processing Systems

Author Response for: "Inverting Gradients - How easy is it to break privacy in federated learning" We thank all reviewers for their valuable feedback and interest in this attack. Some questions arose about the theoretical analysis for fully connected layers. Finally knowledge of the feature representation already enables attacks like Melis et al. This non-uniformity is a significant result for the privacy of gradient batches. Fig.4 of [35] looks better because the attack scenario there is easier.


PREIG: Physics-informed and Reinforcement-driven Interpretable GRU for Commodity Demand Forecasting

Ma, Hongwei, Gao, Junbin, Tran, Minh-Ngoc

arXiv.org Artificial Intelligence

--Accurately forecasting commodity demand remains a critical challenge due to volatile market dynamics, nonlinear dependencies, and the need for economically consistent predictions. This paper introduces PREIG--a Physics-informed and Reinforcement-driven Interpretable model with GRU--a novel deep learning framework tailored for commodity demand forecasting. This constraint is enforced through a customized loss function that penalizes violations of the physical rule, ensuring that model predictions remain interpretable and aligned with economic theory. T o further enhance predictive performance and stability, PREIG incorporates a hybrid optimization strategy that couples NAdam and L-BFGS with Population-Based Training (POP)--a reinforcement-learning inspired mechanism that dynamically tunes hyperparameters via evolutionary exploration and exploitation. Experiments across multiple commodities datasets demonstrate that PREIG significantly outperforms traditional econometric models (ARIMA, GARCH) and deep learning baselines (BPNN,RNN) in both RMSE and MAPE. When compared with GRU, PREIG maintains good explainability while still performing well in prediction. By bridging domain knowledge, optimization theory and deep learning, PREIG provides a robust, interpretable, and scalable solution for high-dimensional nonlinear time series forecasting in economy.