bundle method
Bundle Network: a Machine Learning-Based Bundle Method
Demelas, Francesca, Roux, Joseph Le, Frangioni, Antonio, Lacroix, Mathieu, Traversi, Emiliano, Calvo, Roberto Wolfler
This paper presents Bundle Network, a learning-based algorithm inspired by the Bundle Method for convex non-smooth minimization problems. Unlike classical approaches that rely on heuristic tuning of a regularization parameter, our method automatically learns to adjust it from data. Furthermore, we replace the iterative resolution of the optimization problem that provides the search direction-traditionally computed as a convex combination of gradients at visited points-with a recurrent neural model equipped with an attention mechanism. By leveraging the unrolled graph of computation, our Bundle Network can be trained end-to-end via automatic differentiation. Experiments on Lagrangian dual relaxations of the Multi-Commodity Network Design and Generalized Assignment problems demonstrate that our approach consistently outperforms traditional methods relying on grid search for parameter tuning, while generalizing effectively across datasets.
Polynomial Precision Dependence Solutions to Alignment Research Center Matrix Completion Problems
The motivation for these problems is to enable efficient computation of heuristic estimators to formally evaluate and reason about different quantities of deep neural networks in the interest of AI alignment [3]. Our solutions involve reframing the matrix completion problems as a semidefinite program (SDP) and using recent advances in spectral bundle methods for fast, efficient, and scalable SDP solving. Proving that this task is at least as hard as dense matrix multiplication or positive semidefinite testing would count as a resolution. Question 2 (fast "approximate squaring"): Given A R The core idea is to formulate both questions as semidefinite programs (SDP) and use a spectral bundle method [1, 5, 9-11] to implicitly solve the SDP or obtain a certificate of infeasibility. In the case where the SDP is infeasible, our method computes an upper bound quantifying the degree to which the SDP is infeasible.
Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching
Angell, Rico, McCallum, Andrew
While semidefinite programming (SDP) has traditionally been limited to moderate-sized problems, recent algorithms augmented with matrix sketching techniques have enabled solving larger SDPs. However, these methods achieve scalability at the cost of an increase in the number of necessary iterations, resulting in slower convergence as the problem size grows. Furthermore, they require iteration-dependent parameter schedules that prohibit effective utilization of warm-start initializations important in practical applications with incrementally-arriving data or mixed-integer programming. We present SpecBM, a provably correct, fast and scalable algorithm for solving massive SDPs that can leverage a warm-start initialization to further accelerate convergence. Our proposed algorithm is a spectral bundle method for solving general SDPs containing both equality and inequality constraints. Moveover, when augmented with an optional matrix sketching technique, our algorithm achieves the dramatically improved scalability of previous work while sustaining convergence speed. We empirically demonstrate the effectiveness of our method, both with and without warm-starting, across multiple applications with large instances. For example, on a problem with 600 million decision variables, SpecBM achieved a solution of standard accuracy in less than 7 minutes, where the previous state-of-the-art scalable SDP solver requires more than 16 hours. Our method solves an SDP with more than 10^13 decision variables on a single machine with 16 cores and no more than 128GB RAM; the previous state-of-the-art method had not achieved an accurate solution after 72 hours on the same instance. We make our implementation in pure JAX publicly available.
Bundle Methods for Machine Learning
We present a globally convergent method for regularized risk minimization prob- lems. Our method applies to Support Vector estimation, regression, Gaussian Processes, and any other regularized risk minimization setting which leads to a convex optimization problem. SVMPerf can be shown to be a special case of our approach. In addition to the unified framework we present tight convergence bounds, which show that our algorithm converges in O(1/) steps to precision for general convex problems and in O(log(1/)) steps for continuously differen- tiable problems. We demonstrate in experiments the performance of our approach.
Survey Descent: A Multipoint Generalization of Gradient Descent for Nonsmooth Optimization
For strongly convex objectives that are smooth, the classical theory of gradient descent ensures linear convergence relative to the number of gradient evaluations. An analogous nonsmooth theory is challenging. Even when the objective is smooth at every iterate, the corresponding local models are unstable and the number of cutting planes invoked by traditional remedies is difficult to bound, leading to convergences guarantees that are sublinear relative to the cumulative number of gradient evaluations. We instead propose a multipoint generalization of the gradient descent iteration for local optimization. While designed with general objectives in mind, we are motivated by a ``max-of-smooth'' model that captures the subdifferential dimension at optimality. We prove linear convergence when the objective is itself max-of-smooth, and experiments suggest a more general phenomenon.
Bundle Method Sketching for Low Rank Semidefinite Programming
Ding, Lijun, Grimmer, Benjamin
In this paper, we show that the bundle method can be applied to solve semidefinite programming problems with a low rank solution without ever constructing a full matrix. To accomplish this, we use recent results from randomly sketching matrix optimization problems and from the analysis of bundle methods. Under strong duality and strict complementarity of SDP, we achieve $\tilde{O}(\frac{1}{\epsilon})$ convergence rates for both the primal and the dual sequences, and the algorithm proposed outputs a $O(\sqrt{\epsilon})$ approximate solution $\hat{X}$ (measured by distances) with a low rank representation with at most $\tilde{O}(\frac{1}{\epsilon})$ many iterations.
Algorithms for solving optimization problems arising from deep neural net models: nonsmooth problems
Kungurtsev, Vyacheslav, Pevny, Tomas
Machine Learning models incorporating multiple layered learning networks have been seen to provide effective models for various classification problems. The resulting optimization problem to solve for the optimal vector minimizing the empirical risk is, however, highly nonconvex. This alone presents a challenge to application and development of appropriate optimization algorithms for solving the problem. However, in addition, there are a number of interesting problems for which the objective function is non- smooth and nonseparable. In this paper, we summarize the primary challenges involved, the state of the art, and present some numerical results on an interesting and representative class of problems.
MAP inference via Block-Coordinate Frank-Wolfe Algorithm
Swoboda, Paul, Kolmogorov, Vladimir
We present a new proximal bundle method for Maximum-A-Posteriori (MAP) inference in structured energy minimization problems. The method optimizes a Lagrangean relaxation of the original energy minimization problem using a multi plane block-coordinate Frank-Wolfe method that takes advantage of the specific structure of the Lagrangean decomposition. We show empirically that our method outperforms state-of-the-art Lagrangean decomposition based algorithms on some challenging Markov Random Field, multi-label discrete tomography and graph matching problems.
Randomized Smoothing SVRG for Large-scale Nonsmooth Convex Optimization
In this paper, we consider the problem of minimizing the average of a large number of nonsmooth and convex functions. Such problems often arise in typical machine learning problems as empirical risk minimization, but are computationally very challenging. We develop and analyze a new algorithm that achieves robust linear convergence rate, and both its time complexity and gradient complexity are superior than state-of-art nonsmooth algorithms and subgradient-based schemes. Besides, our algorithm works without any extra error bound conditions on the objective function as well as the common strongly-convex condition. We show that our algorithm has wide applications in optimization and machine learning problems, and demonstrate experimentally that it performs well on a large-scale ranking problem.
Effect of Bundle Method in Distributed Lagrangian Relaxation Protocol
Hanada, Kenta (Kobe University) | Hirayama, Katsutoshi (Kobe University) | Okimoto, Tenda (Kobe University)
The Generalized Mutual Assignment Problem (GMAP) is a maximization problem in distributed environments, where multiple agents select goods under resource constraints. Distributed Lagrangian Relaxation Protocols (DisLRP) are peer-to-peer communication protocols for solving GMAP instances. In DisLRPs, agents seek a good quality upper bound on the optimal value by solving the Lagrangian dual problem, which is a convex minimization problem. Existing DisLRPs exploit a subgradient method to explore a better upper bound by updating the Lagrange multipliers (prices) of goods. While the computational complexity of the subgradient method is very low, it cannot detect tha fact that an upper bound converges to the minimum. Moreover, solution oscillation sometimes occurs, which is critical for its performance. In this paper, we present a new DisLRP with a Bundle Method and refer to it as Bundle DisLRP (BDisLRP). The bundle method, which is also called the stabilized cutting planes method, has recently attracted much attention as a way to solve Lagrangian dual problems in centralized environments. We show that this method can also work in distributed environments. We experimentally compared BDisLRP with Adaptive DisLRP (ADisLRP), which is a previous protocol that exploits the subgradient method, to demonstrate that BDisLRP converged faster with better quality upper bounds than ADisLRP.