Goto

Collaborating Authors

 Optimization


Neural Graph Matching Network: Learning Lawler's Quadratic Assignment Problem with Extension to Hypergraph and Multiple-graph Matching

arXiv.org Machine Learning

Graph matching involves combinatorial optimization based on edge-to-edge affinity matrix, which can be generally formulated as Lawler's Quadratic Assignment Problem (QAP). This paper presents a QAP network directly learning with the affinity matrix (equivalently the association graph) whereby the matching problem is translated into a vertex classification task. The association graph is learned by an embedding network for vertex classification, followed by Sinkhorn normalization and a cross-entropy loss for end-to-end learning. We further improve the embedding model on association graph by introducing Sinkhorn based constraint, and dummy nodes to deal with outliers. To our best knowledge, this is the first network to directly learn with the general Lawler's QAP. In contrast, state-of-the-art deep matching methods focus on the learning of node and edge features in two graphs respectively. We also show how to extend our network to hypergraph matching, and matching of multiple graphs. Experimental results on both synthetic graphs and real-world images show our method outperforms. For pure QAP tasks on synthetic data and QAPLIB, our method can surpass spectral matching and RRWM, especially on challenging problems.


The Convex Information Bottleneck Lagrangian

arXiv.org Machine Learning

The information bottleneck (IB) problem tackles the issue of obtaining relevant compressed representations T of some random variable X for the task of predicting Y. It is defined as a constrained optimization problem which maximizes the information the representation has about the task, I(T;Y), while ensuring that a minimum level of compression r is achieved; i.e., I(X;T) <= r. For practical reasons the problem is usually solved by maximizing the IB Lagrangian for many values of the Lagrange multiplier, therefore drawing the IB curve (i.e., the curve of maximal I(T;Y) for a given I(X;T)) and selecting the representation of desired predictability and compression. It is known when Y is a deterministic function of X, the IB curve cannot be explored, and other Lagrangians have been proposed to tackle this problem; e.g., the squared IB Lagrangian. In this paper, we (i) present a general family of Lagrangians which allow for the exploration of the IB curve in all scenarios; and (ii) prove that if these Lagrangians are used, there is a (and we know the) one-to-one mapping between the Lagrange multiplier and the desired compression rate r for known IB curve shapes, hence, freeing us from the burden of solving the optimization problem for many values of the Lagrange multiplier. That is, we can solve the original constrained problem with a single optimization.


Matrix Normal PCA for Interpretable Dimension Reduction and Graphical Noise Modeling

arXiv.org Machine Learning

Principal component analysis (PCA) is one of the most widely used dimension reduction and multivariate statistical techniques. From a probabilistic perspective, PCA seeks a low-dimensional representation of data in the presence of independent identical Gaussian noise. Probabilistic PCA (PPCA) and its variants have been extensively studied for decades. Most of them assume the underlying noise follows a certain independent identical distribution. However, the noise in the real world is usually complicated and structured. To address this challenge, some non-linear variants of PPCA have been proposed. But those methods are generally difficult to interpret. To this end, we propose a powerful and intuitive PCA method (MN-PCA) through modeling the graphical noise by the matrix normal distribution, which enables us to explore the structure of noise in both the feature space and the sample space. MN-PCA obtains a low-rank representation of data and the structure of noise simultaneously. And it can be explained as approximating data over the generalized Mahalanobis distance. We develop two algorithms to solve this model: one maximizes the regularized likelihood, the other exploits the Wasserstein distance, which is more robust. Extensive experiments on various data demonstrate their effectiveness.


Deep Ordinal Classification with Inequality Constraints

arXiv.org Machine Learning

This study investigates a new constrained-optimization formulation for deep ordinal classification. We impose uni-modality of the label distribution implicitly via a set of inequality constraints over pairs of adjacent labels. To tackle the ensuing challenging optimization problem, we solve a sequence of unconstrained losses based on a powerful extension of the log-barrier method. This accommodates standard SGD for deep networks, and avoids computationally expensive Lagrangian dual steps and projections, while outperforming substantially penalty methods. Our non-parametric model is more flexible than the existing deep ordinal classification techniques: it does not restrict the learned representation to a specific parametric model, allowing the training to explore larger spaces of solutions and removing the need for ad hoc choices, while scaling up to large numbers of labels. It can be used in conjunction with any standard classification loss and any deep architecture. We also propose a new performance metric for ordinal classification, as a proxy to measure a distribution uni-modality, referred to as the Sides Order Index (SOI). We report comprehensive evaluations and comparisons to state-of-the-art methods on benchmark public datasets for several ordinal classification tasks, showing the merits of our approach in terms of label consistency and scalability. A public reproducible PyTorch implementation is provided (https://github.com/sbelharbi/Deep-Ordinal-Classification-with-Inequality-Constraints).


Differentiable Convex Optimization Layers

#artificialintelligence

In this tutorial we introduce our library for creating differentiable optimization layers in PyTorch and TensorFlow. Optimization layers add domain-specific knowledge or learnable hard constraints to machine learning models. In this tutorial we introduce our new library cvxpylayers for easily creating differentiable new convex optimization layers. This lets you express your layer with the CVXPY domain specific language as usual and then export the CVXPY object to an efficient batched and differentiable layer with a single line of code. This project turns every convex optimization problem expressed in CVXPY into a differentiable layer.


Linear Programming

#artificialintelligence

As Linear Programming is a valuable way of displaying real-world data in a mathematical way, it is commonly used in manufacturing and the service industry. For example, many large distribution companies will use linear programming in the analysis of their supply chain operations, similar to the toy example above. Additionally, linear programming can be used outside the warehouse in the optimization of delivery routes. Companies like Amazon and FedEx use linear programming to find the shortest and most efficient delivery routes. Linear programming is also used in machine learning applications where a neural network is trained to fit model of a function in order to label input data and predict unknown future values.


Three Dimensional Route Planning for Multiple Unmanned Aerial Vehicles using Salp Swarm Algorithm

arXiv.org Artificial Intelligence

Route planning for multiple Unmanned Aerial Vehicles (UAVs) is a series of translation and rotational steps from a given start location to the destination goal location. The goal of the route planning problem is to determine the most optimal route avoiding any collisions with the obstacles present in the environment. Route planning is an NP-hard optimization problem. In this paper, a newly proposed Salp Swarm Algorithm (SSA) is used, and its performance is compared with deterministic and other Nature-Inspired Algorithms (NIAs). The results illustrate that SSA outperforms all the other meta-heuristic algorithms in route planning for multiple UAVs in a 3D environment. The proposed approach improves the average cost and overall time by 1.25% and 6.035% respectively when compared to recently reported data. Route planning is involved in many real-life applications like robot navigation, self-driving car, autonomous UAV for search and rescue operations in dangerous ground-zero situations, civilian surveillance, military combat and even commercial services like package delivery by drones.


Projective Quadratic Regression for Online Learning

arXiv.org Machine Learning

This paper considers online convex optimization (OCO) problems - the paramount framework for online learning algorithm design. The loss function of learning task in OCO setting is based on streaming data so that OCO is a powerful tool to model large scale applications such as online recommender systems. Meanwhile, real-world data are usually of extreme high-dimensional due to modern feature engineering techniques so that the quadratic regression is impractical. Factorization Machine as well as its variants are efficient models for capturing feature interactions with low-rank matrix model but they can't fulfill the OCO setting due to their non-convexity. In this paper, We propose a projective quadratic regression (PQR) model. First, it can capture the import second-order feature information. Second, it is a convex model, so the requirements of OCO are fulfilled and the global optimal solution can be achieved. Moreover, existing modern online optimization methods such as Online Gradient Descent (OGD) or Follow-The-Regularized-Leader (FTRL) can be applied directly. In addition, by choosing a proper hyper-parameter, we show that it has the same order of space and time complexity as the linear model and thus can handle high-dimensional data. Experimental results demonstrate the performance of the proposed PQR model in terms of accuracy and efficiency by comparing with the state-of-the-art methods.


Fast Polynomial Kernel Classification for Massive Data

arXiv.org Machine Learning

In the era of big data, it is highly desired to develop efficient machine learning algorithms to tackle massive data challenges such as storage bottleneck, algorithmic scalability, and interpretability. In this paper, we develop a novel efficient classification algorithm, called fast polynomial kernel classification (FPC), to conquer the scalability and storage challenges. Our main tools are a suitable selected feature mapping based on polynomial kernels and an alternating direction method of multipliers (ADMM) algorithm for a related non-smooth convex optimization problem. Fast learning rates as well as feasibility verifications including the convergence of ADMM and the selection of center points are established to justify theoretical behaviors of FPC. Our theoretical assertions are verified by a series of simulations and real data applications. The numerical results demonstrate that FPC significantly reduces the computational burden and storage memory of the existing learning schemes such as support vector machines and boosting, without sacrificing their generalization abilities much.


Coupling Matrix Manifolds and Their Applications in Optimal Transport

arXiv.org Machine Learning

Optimal transport (OT) is a powerful tool for measuring the distance between two defined probability distributions. In this paper, we develop a new manifold named the coupling matrix manifold (CMM), where each point on CMM can be regarded as the transportation plan of the OT problem. We firstly explore the Riemannian geometry of CMM with the metric expressed by the Fisher information. These geometrical features of CMM have paved the way for developing numerical Riemannian optimization algorithms such as Riemannian gradient descent and Riemannian trust-region algorithms, forming a uniform optimization method for all types of OT problems. The proposed method is then applied to solve several OT problems studied by previous literature. The results of the numerical experiments illustrate that the optimization algorithms that are based on the method proposed in this paper are comparable to the classic ones, for example, the Sinkhorn algorithm, while outperforming other state-of-the-art algorithms without considering the geometry information, especially in the case of non-entropy optimal transport.