Optimization
Model compression as constrained optimization, with application to neural nets. Part I: general framework
Compressing neural nets is an active research problem, given the large size of state-of-the-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. We give a general formulation of model compression as constrained optimization. This includes many types of compression: quantization, low-rank decomposition, pruning, lossless compression and others. Then, we give a general algorithm to optimize this nonconvex problem based on the augmented Lagrangian and alternating optimization. This results in a "learning-compression" algorithm, which alternates a learning step of the uncompressed model, independent of the compression type, with a compression step of the model parameters, independent of the learning task. This simple, efficient algorithm is guaranteed to find the best compressed model for the task in a local sense under standard assumptions. We present separately in several companion papers the development of this general framework into specific algorithms for model compression based on quantization, pruning and other variations, including experimental results on compressing neural nets and other models.
Graph Learning from Data under Structural and Laplacian Constraints
Egilmez, Hilmi E., Pavez, Eduardo, Ortega, Antonio
RAPHS are generic mathematical structures consisting of sets of vertices and edges, which are used for modeling pairwise relations (edges) between a number of objects (vertices). In practice, this representation is often extended to weighted graphs, for which a set of scalar values (weights) are assigned to edges and potentially to vertices. Thus, weighted graphs offer general and flexible representations for modeling affinity relations between the objects of interest. Many practical problems can be represented using weighted graphs. For example, a broad class of combinatorial problems such as weighted matching, shortest-path and network-flow [2] are defined using weighted graphs. In signal/data-oriented problems, weighted graphs provide concise (sparse) representations for robust modeling of signals/data [3]. Such graphbased models are also useful for analyzing and visualizing the relations between their samples/features. Moreover, weighted graphs naturally emerge in networked data applications, such as learning, signal processing and analysis on computer, social, sensor, energy, transportation and biological networks [4], where the signals/data are inherently related to a graph associated with the underlying network.
Robust Optimization for Non-Convex Objectives
Chen, Robert, Lucier, Brendan, Singer, Yaron, Syrgkanis, Vasilis
We consider robust optimization problems, where the goal is to optimize in the worst case over a class of objective functions. We develop a reduction from robust improper optimization to Bayesian optimization: given an oracle that returns $\alpha$-approximate solutions for distributions over objectives, we compute a distribution over solutions that is $\alpha$-approximate in the worst case. We show that de-randomizing this solution is NP-hard in general, but can be done for a broad class of statistical learning tasks. We apply our results to robust neural network training and submodular optimization. We evaluate our approach experimentally on corrupted character classification, and robust influence maximization in networks.
Provable Inductive Robust PCA via Iterative Hard Thresholding
Niranjan, U. N., Rajkumar, Arun, Tulabandhula, Theja
The robust PCA problem, wherein, given an input data matrix that is the superposition of a low-rank matrix and a sparse matrix, we aim to separate out the low-rank and sparse components, is a well-studied problem in machine learning. One natural question that arises is that, as in the inductive setting, if features are provided as input as well, can we hope to do better? Answering this in the affirmative, the main goal of this paper is to study the robust PCA problem while incorporating feature information. In contrast to previous works in which recovery guarantees are based on the convex relaxation of the problem, we propose a simple iterative algorithm based on hard-thresholding of appropriate residuals. Under weaker assumptions than previous works, we prove the global convergence of our iterative procedure; moreover, it admits a much faster convergence rate and lesser computational complexity per iteration. In practice, through systematic synthetic and real data simulations, we confirm our theoretical findings regarding improvements obtained by using feature information.
Projection Free Rank-Drop Steps
The Frank-Wolfe (FW) algorithm has been widely used in solving nuclear norm constrained problems, since it does not require projections. However, FW often yields high rank intermediate iterates, which can be very expensive in time and space costs for large problems. To address this issue, we propose a rank-drop method for nuclear norm constrained problems. The goal is to generate descent steps that lead to rank decreases, maintaining low-rank solutions throughout the algorithm. Moreover, the optimization problems are constrained to ensure that the rank-drop step is also feasible and can be readily incorporated into a projection-free minimization method, e.g., Frank-Wolfe. We demonstrate that by incorporating rank-drop steps into the Frank-Wolfe algorithm, the rank of the solution is greatly reduced compared to the original Frank-Wolfe or its common variants.
A Spatio-Temporal Representation for the Orienteering Problem with Time-Varying Profits
Ma, Zhibei, Yin, Kai, Liu, Lantao, Sukhatme, Gaurav S.
We consider an orienteering problem (OP) where an agent needs to visit a series (possibly a subset) of depots, from which the maximal accumulated profits are desired within given limited time budget. Different from most existing works where the profits are assumed to be static, in this work we investigate a variant that has arbitrary time-dependent profits. Specifically, the profits to be collected change over time and they follow different (e.g., independent) time-varying functions. The problem is of inherent nonlinearity and difficult to solve by existing methods. To tackle the challenge, we present a simple and effective framework that incorporates time-variations into the fundamental planning process. Specifically, we propose a deterministic spatio-temporal representation where both spatial description and temporal logic are unified into one routing topology. By employing existing basic sorting and searching algorithms, the routing solutions can be computed in an extremely efficient way. The proposed method is easy to implement and extensive numerical results show that our approach is time efficient and generates near-optimal solutions.
Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)
Ahmed, Asrar, Varakantham, Pradeep, Lowalekar, Meghna, Adulyasak, Yossiri, Jaillet, Patrick
Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies have focused on the computation of maximin policies which maximize the value in the worst case amongst all realisations of uncertainty. Given the overly conservative nature of maximin policies, recent work has proposed minimax regret as an ideal alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only and they are also limited in their scalability. Therefore, we provide a general model of uncertain MDPs that considers uncertainty over both transition and reward functions. Furthermore, we also consider dependence of the uncertainty across different states and decision epochs. We also provide a mixed integer linear program formulation for minimizing regret given a set of samples of the transition and reward functions in the uncertain MDP. In addition, we provide two myopic variants of regret, namely Cumulative Expected Myopic Regret (CEMR) and One Step Regret (OSR) that can be optimized in a scalable manner. Specifically, we provide dynamic programming and policy iteration based algorithms to optimize CEMR and OSR respectively. Finally, to demonstrate the effectiveness of our approaches, we provide comparisons on two benchmark problems from literature. We observe that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret.
Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning
Curtis, Frank E., Scheinberg, Katya
The goal of this tutorial is to introduce key models, algorithms, and open questions related to the use of optimization methods for solving problems arising in machine learning. It is written with an INFORMS audience in mind, specifically those readers who are familiar with the basics of optimization algorithms, but less familiar with machine learning. We begin by deriving a formulation of a supervised learning problem and show how it leads to various optimization problems, depending on the context and underlying assumptions. We then discuss some of the distinctive features of these optimization problems, focusing on the examples of logistic regression and the training of deep neural networks. The latter half of the tutorial focuses on optimization algorithms, first for convex logistic regression, for which we discuss the use of first-order methods, the stochastic gradient method, variance reducing stochastic methods, and second-order methods. Finally, we discuss how these approaches can be employed to the training of deep neural networks, emphasizing the difficulties that arise from the complex, nonconvex structure of these models.
Max-Product Belief Propagation for Linear Programming: Applications to Combinatorial Optimization
The max-product {belief propagation} (BP) is a popular message-passing heuristic for approximating a maximum-a-posteriori (MAP) assignment in a joint distribution represented by a graphical model (GM). In the past years, it has been shown that BP can solve a few classes of linear programming (LP) formulations to combinatorial optimization problems including maximum weight matching, shortest path and network flow, i.e., BP can be used as a message-passing solver for certain combinatorial optimizations. However, those LPs and corresponding BP analysis are very sensitive to underlying problem setups, and it has been not clear what extent these results can be generalized to. In this paper, we obtain a generic criteria that BP converges to the optimal solution of given LP, and show that it is satisfied in LP formulations associated to many classical combinatorial optimization problems including maximum weight perfect matching, shortest path, traveling salesman, cycle packing, vertex/edge cover and network flow.
Recovery of Missing Samples Using Sparse Approximation via a Convex Similarity Measure
Javaheri, Amirhossein, Zayyani, Hadi, Marvasti, Farokh
In this paper, we study the missing sample recovery problem using methods based on sparse approximation. In this regard, we investigate the algorithms used for solving the inverse problem associated with the restoration of missed samples of image signal. This problem is also known as inpainting in the context of image processing and for this purpose, we suggest an iterative sparse recovery algorithm based on constrained $l_1$-norm minimization with a new fidelity metric. The proposed metric called Convex SIMilarity (CSIM) index, is a simplified version of the Structural SIMilarity (SSIM) index, which is convex and error-sensitive. The optimization problem incorporating this criterion, is then solved via Alternating Direction Method of Multipliers (ADMM). Simulation results show the efficiency of the proposed method for missing sample recovery of 1D patch vectors and inpainting of 2D image signals.