primal solution
It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data
Schnaus, Dominik, Araslanov, Nikita, Cremers, Daniel
The platonic representation hypothesis suggests that vision and language embeddings become more homogeneous as model and dataset sizes increase. In particular, pairwise distances within each modality become more similar. This suggests that as foundation models mature, it may become possible to match vision and language embeddings in a fully unsupervised fashion, i.e. without parallel data. We present the first feasibility study, and investigate conformity of existing vision and language foundation models in the context of unsupervised, or "blind", matching. First, we formulate unsupervised matching as a quadratic assignment problem and introduce a novel heuristic that outperforms previous solvers. We also develop a technique to find optimal matching problems, for which a non-trivial match is very likely. Second, we conduct an extensive study deploying a range of vision and language models on four datasets. Our analysis reveals that for many problem instances, vision and language representations can be indeed matched without supervision. This finding opens up the exciting possibility of embedding semantic knowledge into other modalities virtually annotation-free. As a proof of concept, we showcase an unsupervised classifier, which achieves non-trivial classification accuracy without any image-text annotation.
Convergence Rate Analysis of MAP Coordinate Minimization Algorithms
Finding maximum a posteriori (MAP) assignments in graphical models is an important task in many applications. Since the problem is generally hard, linear programming (LP) relaxations are often used. Solving these relaxations efficiently is thus an important practical problem. In recent years, several authors have proposed message passing updates corresponding to coordinate descent in the dual LP. However, these are generally not guaranteed to converge to a global optimum.
Learning Constrained Optimization with Deep Augmented Lagrangian Methods
Kotary, James, Fioretto, Ferdinando
Learning to Optimize (LtO) is a problem setting in which a machine learning (ML) model is trained to emulate a constrained optimization solver. Learning to produce optimal and feasible solutions subject to complex constraints is a difficult task, but is often made possible by restricting the input space to a limited distribution of related problems. Most LtO methods focus on directly learning solutions to the primal problem, and applying correction schemes or loss function penalties to encourage feasibility. This paper proposes an alternative approach, in which the ML model is trained instead to predict dual solution estimates directly, from which primal estimates are constructed to form dual-feasible solution pairs. This enables an end-to-end training scheme is which the dual objective is maximized as a loss function, and solution estimates iterate toward primal feasibility, emulating a Dual Ascent method. First it is shown that the poor convergence properties of classical Dual Ascent are reflected in poor convergence of the proposed training scheme. Then, by incorporating techniques from practical Augmented Lagrangian methods, we show how the training scheme can be improved to learn highly accurate constrained optimization solvers, for both convex and nonconvex problems.
Learning Primal Heuristics for Mixed Integer Programs
Shen, Yunzhuang, Sun, Yuan, Eberhard, Andrew, Li, Xiaodong
This paper proposes a novel primal heuristic for Mixed Integer Programs, by employing machine learning techniques. Mixed Integer Programming is a general technique for formulating combinatorial optimization problems. Inside a solver, primal heuristics play a critical role in finding good feasible solutions that enable one to tighten the duality gap from the outset of the Branch-and-Bound algorithm (B&B), greatly improving its performance by pruning the B&B tree aggressively. In this paper, we investigate whether effective primal heuristics can be automatically learned via machine learning. We propose a new method to represent an optimization problem as a graph, and train a Graph Convolutional Network on solved problem instances with known optimal solutions. This in turn can predict the values of decision variables in the optimal solution for an unseen problem instance of a similar type. The prediction of variable solutions is then leveraged by a novel configuration of the B&B method, Probabilistic Branching with guided Depth-first Search (PB-DFS) approach, aiming to find (near-)optimal solutions quickly. The experimental results show that this new heuristic can find better primal solutions at a much earlier stage of the solving process, compared to other state-of-the-art primal heuristics.
Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization
Hazimeh, Hussein, Mazumder, Rahul, Saab, Ali
We consider the least squares regression problem, penalized with a combination of the $\ell_{0}$ and $\ell_{2}$ norms (a.k.a. $\ell_0 \ell_2$ regularization). Recent work presents strong evidence that the resulting $\ell_0$-based estimators can outperform popular sparse learning methods, under many important high-dimensional settings. However, exact computation of $\ell_0$-based estimators remains a major challenge. Indeed, state-of-the-art mixed integer programming (MIP) methods for $\ell_0 \ell_2$-regularized regression face difficulties in solving many statistically interesting instances when the number of features $p \sim 10^4$. In this work, we present a new exact MIP framework for $\ell_0\ell_2$-regularized regression that can scale to $p \sim 10^7$, achieving over $3600$x speed-ups compared to the fastest exact methods. Unlike recent work, which relies on modern MIP solvers, we design a specialized nonlinear BnB framework, by critically exploiting the problem structure. A key distinguishing component in our algorithm lies in efficiently solving the node relaxations using specialized first-order methods, based on coordinate descent (CD). Our CD-based method effectively leverages information across the BnB nodes, through using warm starts, active sets, and gradient screening. In addition, we design a novel method for obtaining dual bounds from primal solutions, which certifiably works in high dimensions. Experiments on synthetic and real high-dimensional datasets demonstrate that our method is not only significantly faster than the state of the art, but can also deliver certifiably optimal solutions to statistically challenging instances that cannot be handled with existing methods. We open source the implementation through our toolkit L0BnB.
Matheuristics to optimize maintenance scheduling and refueling of nuclear power plants
Dupin, Nicolas, Talbi, El-Ghazali
Scheduling the maintenances of nuclear power plants is a complex optimization problem, formulated in 2-stage stochastic programming for the EURO/ROADEF 2010 challenge. The first level optimizes the maintenance dates and refueling decisions. The second level optimizes the production to fulfill the power demands and to ensure feasibility and costs of the first stage decisions. This paper solves a deterministic version of the problem, studying Mixed Integer Programming (MIP) formulations and matheuristics. Relaxing only two sets of constraints of the ROADEF challenge, a MIP formulation can be written using only binary variables for the maintenance dates. The MIP formulations are used to design constructive matheuristics and a Variable Neighborhood Descent (VND) local search. These matheuristics produce very high quality solutions. Some intermediate results explains results of the Challenge: the relaxation of constraints CT6 are justified and neighborhood analyses with MIP-VND justifies the choice of neighborhoods to implement for the problem. Lastly, an extension with stability costs for monthly reoptimization is considered, with efficient bi-objective matheuristics.
Submodular relaxation for inference in Markov random fields
The problem of inference in a Markov random field (MRF) arises in many applied domains, e.g. in machine learning, computer vision, natural language processing, etc. In this paper we focus on one important type of inference: maximum a posteriori (MAP) inference, often referred to as MRF energy minimization. Inference of this type is a combinatorial optimization problem, i.e. an optimization problem with the finite domain. The most studied case of MRF energy minimization is the situation when the energy can be represented as a sum of terms (potentials) that depend on only one or two variables each (unary and pairwise potentials). In this setting the energy is said to be defined by a graph where the nodes correspond to the variables and the edges to the pairwise potentials. Minimization of energies defined on graphs in known to be NPhard in general [8] but can be done exactly in polynomial time in a number of special cases, e.g. if the graph defining the energy is acyclic [36] or if the energy is submodular in standard [28] or multi-label sense [10]. One way to go beyond pairwise potentials is to add higher-order summands to the energy. For example, Kohli et al. [23] and Ladickรฝ et al. [32] use high-order potentials based on superpixels (image regions) for semantic image segmentation; Delong et al. [11] use label cost potentials for geometric model fitting tasks. To be tractable, high-order potentials need to have a compact representation.
Convergence Rate Analysis of MAP Coordinate Minimization Algorithms
Meshi, Ofer, Globerson, Amir, Jaakkola, Tommi S.
Finding maximum aposteriori (MAP) assignments in graphical models is an important task in many applications. Since the problem is generally hard, linear programming (LP) relaxations are often used. Solving these relaxations efficiently is thus an important practical problem. In recent years, several authors have proposed message passing updates corresponding to coordinate descent in the dual LP. However,these are generally not guaranteed to converge to a global optimum. One approach to remedy this is to smooth the LP, and perform coordinate descent on the smoothed dual. However, little is known about the convergence rate of this procedure. Here we perform a thorough rate analysis of such schemes and derive primal and dual convergence rates. We also provide a simple dual to primal mapping that yields feasible primal solutions with a guaranteed rate of convergence. Empirical evaluation supports our theoretical claims and shows that the method is highly competitive with state of the art approaches that yield global optima.
A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing
Dual decomposition, and more generally Lagrangian relaxation, is a classical method for combinatorial optimization; it has recently been applied to several inference problems in natural language processing (NLP). This tutorial gives an overview of the technique. We describe example algorithms, describe formal guarantees for the method, and describe practical issues in implementing the algorithms. While our examples are predominantly drawn from the NLP literature, the material should be of general relevance to inference problems in machine learning. A central theme of this tutorial is that Lagrangian relaxation is naturally applied in conjunction with a broad class of combinatorial algorithms, allowing inference in models that go significantly beyond previous work on Lagrangian relaxation for inference in graphical models.
Lagrangian Relaxation Techniques for Scalable Spatial Conservation Planning
Kumar, Akshat (University of Massachusetts Amherst) | Wu, Xiaojian (University of Massachusetts) | Zilberstein, Shlomo (University of Massachusetts)
We address the problem of spatial conservation planning in which the goal is to maximize the expected spread of cascades of an endangered species by strategically purchasing land parcels within a given budget. This problem can be solved by standard integer programming methods using the sample average approximation (SAA) scheme. Our main contribution lies in exploiting the separable structure present in this problem and using Lagrangian relaxation techniques to gain scalability over the flat representation. We also generalize the approach to allow the application of the SAA scheme to a range of stochastic optimization problems. Our iterative approach is highly efficient in terms of space requirements and it provides an upper bound over the optimal solution at each iteration. We apply our approach to the Red-cockaded Woodpecker conservation problem. The results show that it can find the optimal solution significantly faster---sometimes by an order-of-magnitude---than using the flat representation for a range of budget sizes.