Optimization
Neuro-Optimization: Learning Objective Functions Using Neural Networks
Jeon, Younghan, Lee, Minsik, Choi, Jin Young
Mathematical optimization is widely used in various research fields. With a carefully-designed objective function, mathematical optimization can be quite helpful in solving many problems. However, objective functions are usually hand-crafted and designing a good one can be quite challenging. In this paper, we propose a novel framework to learn the objective function based on a neural net-work. The basic idea is to consider the neural network as an objective function, and the input as an optimization variable. For the learning of objective function from the training data, two processes are conducted: In the inner process, the optimization variable (the input of the network) are optimized to minimize the objective function (the network output), while fixing the network weights. In the outer process, on the other hand, the weights are optimized based on how close the final solution of the inner process is to the desired solution. After learning the objective function, the solution for the test set is obtained in the same manner of the inner process. The potential and applicability of our approach are demonstrated by the experiments on toy examples and a computer vision task, optical flow.
Privacy-Preserving Obfuscation of Critical Infrastructure Networks
Fioretto, Ferdinando, Mak, Terrence W. K., Van Hentenryck, Pascal
The paper studies how to release data about a critical infrastructure network (e.g., the power network or a transportation network) without disclosing sensitive information that can be exploited by malevolent agents, while preserving the realism of the network. It proposes a novel obfuscation mechanism that combines several privacy-preserving building blocks with a bi-level optimization model to significantly improve accuracy. The obfuscation is evaluated for both realism and privacy properties on real energy and transportation networks. Experimental results show the obfuscation mechanism substantially reduces the potential damage of an attack exploiting the released data to harm the real network.
Scale Invariant Power Iteration
Kim, Cheolmin, Kim, Youngseok, Klabjan, Diego
Power iteration has been generalized to solve many interesting problems in machine learning and statistics. Despite its striking success, theoretical understanding of when and how such an algorithm enjoys good convergence property is limited. In this work, we introduce a new class of optimization problems called scale invariant problems and prove that they can be efficiently solved by scale invariant power iteration (SCI-PI) with a generalized convergence guarantee of power iteration. By deriving that a stationary point is an eigenvector of the Hessian evaluated at the point, we show that scale invariant problems indeed resemble the leading eigenvector problem near a local optimum. Also, based on a novel reformulation, we geometrically derive SCI-PI which has a general form of power iteration. The convergence analysis shows that SCI-PI attains local linear convergence with a rate being proportional to the top two eigenvalues of the Hessian at the optimum. Moreover, we discuss some extended settings of scale invariant problems and provide similar convergence results for them. In numerical experiments, we introduce applications to independent component analysis, Gaussian mixtures, and non-negative matrix factorization. Experimental results demonstrate that SCI-PI is competitive to state-of-the-art benchmark algorithms and often yield better solutions.
Distributional Policy Optimization: An Alternative Approach for Continuous Control
Tessler, Chen, Tennenholtz, Guy, Mannor, Shie
We identify a fundamental problem in policy gradient-based methods in continuous control. As policy gradient methods require the agent's underlying probability distribution, they limit policy representation to parametric distribution classes. We show that optimizing over such sets results in local movement in the action space and thus convergence to sub-optimal solutions. We suggest a novel distributional framework, able to represent arbitrary distribution functions over the continuous action space. Using this framework, we construct a generative scheme, trained using an off-policy actor-critic paradigm, which we call the Generative Actor Critic (GAC). Compared to policy gradient methods, GAC does not require knowledge of the underlying probability distribution, thereby overcoming these limitations. Empirical evaluation shows that our approach is comparable and often surpasses current state-of-the-art baselines in continuous domains.
Non-monotone DR-submodular Maximization: Approximation and Regret Guarantees
Dรผrr, Christoph, Thang, Nguyen Kim, Srivastav, Abhinav, Tible, Lรฉo
Diminishing-returns (DR) submodular optimization is an important field with many real-world applications in machine learning, economics and communication systems. It captures a subclass of non-convex optimization that provides both practical and theoretical guarantees. In this paper, we study the fundamental problem of maximizing non-monotone DR-submodular functions over down-closed and general convex sets in both offline and online settings. First, we show that for offline maximizing non-monotone DR-submodular functions over a general convex set, the Frank-Wolfe algorithm achieves an approximation guarantee which depends on the convex set. Next, we show that the Stochastic Gradient Ascent algorithm achieves a 1/4-approximation ratio with the regret of $O(1/\sqrt{T})$ for the problem of maximizing non-monotone DR-submodular functions over down-closed convex sets. These are the first approximation guarantees in the corresponding settings. Finally we benchmark these algorithms on problems arising in machine learning domain with the real-world datasets.
Learning Mahalanobis Metric Spaces via Geometric Approximation Algorithms
Ihara, Diego, Mohammadi, Neshat, Sidiropoulos, Anastasios
Learning Mahalanobis metric spaces is an important problem that has found numerous applications. Several algorithms have been designed for this problem, including Information Theoretic Metric Learning (ITML) by [Davis et al. 2007] and Large Margin Nearest Neighbor (LMNN) classification by [Weinberger and Saul 2009]. We consider a formulation of Mahalanobis metric learning as an optimization problem, where the objective is to minimize the number of violated similarity/dissimilarity constraints. We show that for any fixed ambient dimension, there exists a fully polynomial-time approximation scheme (FPTAS) with nearly-linear running time. This result is obtained using tools from the theory of linear programming in low dimensions. We also discuss improvements of the algorithm in practice, and present experimental results on synthetic and real-world data sets.
Bayesian Optimization over Sets
Kim, Jungtaek, McCourt, Michael, You, Tackgeun, Kim, Saehoon, Choi, Seungjin
We propose a Bayesian optimization method over sets, to minimize a black-box function that can take a set as single input. Because set inputs are permutation-invariant and variable-length, traditional Gaussian process-based Bayesian optimization strategies which assume vector inputs can fall short. To address this, we develop a Bayesian optimization method with \emph{set kernel} that is used to build surrogate functions. This kernel accumulates similarity over set elements to enforce permutation-invariance and permit sets of variable size, but this comes at a greater computational cost. To reduce this burden, we propose a more efficient probabilistic approximation which we prove is still positive definite and is an unbiased estimator of the true set kernel. Finally, we present several numerical experiments which demonstrate that our method outperforms other methods in various applications.
Learning Optimal Data Augmentation Policies via Bayesian Optimization for Image Classification Tasks
Zhang, Chunxu, Cui, Jiaxu, Yang, Bo
In recent years, deep learning has achieved remarkable achievements in many fields, including computer vision, natural language processing, speech recognition and others. Adequate training data is the key to ensure the effectiveness of the deep models. However, obtaining valid data requires a lot of time and labor resources. Data augmentation (DA) is an effective alternative approach, which can generate new labeled data based on existing data using label-preserving transformations. Although we can benefit a lot from DA, designing appropriate DA policies requires a lot of expert experience and time consumption, and the evaluation of searching the optimal policies is costly. So we raise a new question in this paper: how to achieve automated data augmentation at as low cost as possible? We propose a method named BO-Aug for automating the process by finding the optimal DA policies using the Bayesian optimization approach. Our method can find the optimal policies at a relatively low search cost, and the searched policies based on a specific dataset are transferable across different neural network architectures or even different datasets. We validate the BO-Aug on three widely used image classification datasets, including CIFAR-10, CIFAR-100 and SVHN. Experimental results show that the proposed method can achieve state-of-the-art or near advanced classification accuracy. Code to reproduce our experiments is available at https://github.com/zhangxiaozao/BO-Aug.
AI-CARGO: A Data-Driven Air-Cargo Revenue Management System
Rizzo, Stefano Giovanni, Lucas, Ji, Kaoudi, Zoi, Quiane-Ruiz, Jorge-Arnulfo, Chawla, Sanjay
We propose AI-CARGO, a revenue management system for air-cargo that combines machine learning prediction with decision-making using mathematical optimization methods. AI-CARGO addresses a problem that is unique to the air-cargo business, namely the wide discrepancy between the quantity (weight or volume) that a shipper will book and the actual received amount at departure time by the airline. The discrepancy results in sub-optimal and inefficient behavior by both the shipper and the airline resulting in the overall loss of potential revenue for the airline. AI-CARGO also includes a data cleaning component to deal with the heterogeneous forms in which booking data is transmitted to the airline cargo system. AI-CARGO is deployed in the production environment of a large commercial airline company. We have validated the benefits of AI-CARGO using real and synthetic datasets. Especially, we have carried out simulations using dynamic programming techniques to elicit the impact on offloading costs and revenue generation of our proposed system. Our results suggest that combining prediction within a decision-making framework can help dramatically to reduce offloading costs and optimize revenue generation.
Beyond Alternating Updates for Matrix Factorization with Inertial Bregman Proximal Gradient Algorithms
Mukkamala, Mahesh Chandra, Ochs, Peter
Matrix Factorization is a popular non-convex objective, for which alternating minimization schemes are mostly used. They usually suffer from the major drawback that the solution is biased towards one of the optimization variables. A remedy is non-alternating schemes. However, due to a lack of Lipschitz continuity of the gradient in matrix factorization problems, convergence cannot be guaranteed. A recently developed remedy relies on the concept of Bregman distances, which generalizes the standard Euclidean distance. We exploit this theory by proposing a novel Bregman distance for matrix factorization problems, which, at the same time, allows for simple/closed form update steps. Therefore, for non-alternating schemes, such as the recently introduced Bregman Proximal Gradient (BPG) method and an inertial variant Convex--Concave Inertial BPG (CoCaIn BPG), convergence of the whole sequence to a stationary point is proved for Matrix Factorization. In several experiments, we observe a superior performance of our non-alternating schemes in terms of speed and objective value at the limit point.