Optimization
Task-based End-to-end Model Learning in Stochastic Optimization
Donti, Priya L., Amos, Brandon, Kolter, J. Zico
With the increasing popularity of machine learning techniques, it has become common to see prediction algorithms operating within some larger process. However, the criteria by which we train these algorithms often differ from the ultimate criteria on which we evaluate them. This paper proposes an end-to-end approach for learning probabilistic machine learning models in a manner that directly captures the ultimate task-based objective for which they will be used, within the context of stochastic programming. We present three experimental evaluations of the proposed approach: a classical inventory stock problem, a real-world electrical grid scheduling task, and a real-world energy storage arbitrage task. We show that the proposed approach can outperform both traditional modeling and purely black-box policy optimization approaches in these applications.
Sparse Inverse Covariance Estimation for Chordal Structures
Fattahi, Salar, Zhang, Richard Y., Sojoudi, Somayeh
In this paper, we consider the Graphical Lasso (GL), a popular optimization problem for learning the sparse representations of high-dimensional datasets, which is well-known to be computationally expensive for large-scale problems. Recently, we have shown that the sparsity pattern of the optimal solution of GL is equivalent to the one obtained from simply thresholding the sample covariance matrix, for sparse graphs under different conditions. We have also derived a closed-form solution that is optimal when the thresholded sample covariance matrix has an acyclic structure. As a major generalization of the previous result, in this paper we derive a closed-form solution for the GL for graphs with chordal structures. We show that the GL and thresholding equivalence conditions can significantly be simplified and are expected to hold for high-dimensional problems if the thresholded sample covariance matrix has a chordal structure. We then show that the GL and thresholding equivalence is enough to reduce the GL to a maximum determinant matrix completion problem and drive a recursive closed-form solution for the GL when the thresholded sample covariance matrix has a chordal structure. For large-scale problems with up to 450 million variables, the proposed method can solve the GL problem in less than 2 minutes, while the state-of-the-art methods converge in more than 2 hours.
Automated Algorithm Selection on Continuous Black-Box Problems By Combining Exploratory Landscape Analysis and Machine Learning
Kerschke, Pascal, Trautmann, Heike
LTHOUGH the Algorithm Selection Problem (ASP, [1]) has been introduced more than four decades ago, there only exist few works (e.g., [2], [3]), which perform algorithm selection in the field of continuous optimization. Independent of the underlying domain, the goal of the ASP can be described as follows: given a set of optimization algorithms A, often denoted algorithm portfolio, and a set of problem instances I, one wants to find a model m: I A that selects the best algorithm A A from the portfolio for an unseen problem instance I I. Albeit there already exists a plethora of optimization algorithms - even when only considering singleobjective, continuous optimization problems - none of them can be considered to be superior to all the other ones across all optimization problems. Hence, it is very desirable to find a sophisticated selection mechanism, which automatically picks the portfolio's best solver for a given problem. Within other optimization domains, such as the well-known Travelling Salesperson Problem, feature-based algorithm selectors have already shown their capability of outperforming the respective state-of-the-art optimization algorithm(s) by combining machine learning techniques and problem dependent features [4], [5].
Predicting shim gaps in aircraft assembly with machine learning and sparse sensing
Manohar, Krithika, Hogan, Thomas, Buttrick, Jim, Banerjee, Ashis G., Kutz, J. Nathan, Brunton, Steven L.
A modern aircraft may require on the order of thousands of custom shims to fill gaps between structural components in the airframe that arise due to manufacturing tolerances adding up across large structures. These shims are necessary to eliminate gaps, maintain structural performance, and minimize pull-down forces required to bring the aircraft into engineering nominal configuration for peak aerodynamic efficiency. Gap filling is a time-consuming process, involving either expensive by-hand inspection or computations on vast quantities of measurement data from increasingly sophisticated metrology equipment. Either case amounts to significant delays in production, with much of the time spent in the critical path of aircraft assembly. This work presents an alternative strategy for predictive shimming, based on machine learning and sparse sensing to first learn gap distributions from historical data, and then design optimized sparse sensing strategies to streamline data collection and processing. This new approach is based on the assumption that patterns exist in shim distributions across aircraft, which may be mined and used to reduce the burden of data collection and processing in future aircraft. Specifically, robust principal component analysis is used to extract low-dimensional patterns in the gap measurements while rejecting outliers. Next, optimized sparse sensors are obtained that are most informative about the dimensions of a new aircraft in these low-dimensional principal components. We demonstrate the success of the proposed approach, called PIXel Identification Despite Uncertainty in Sensor Technology (PIXI-DUST), on historical production data from 54 representative Boeing commercial aircraft. Our algorithm successfully predicts $99\%$ of shim gaps within the desired measurement tolerance using $3\%$ of the laser scan points typically required; all results are cross-validated.
Global optimization for low-dimensional switching linear regression and bounded-error estimation
The paper provides global optimization algorithms for two particularly difficult nonconvex problems raised by hybrid system identification: switching linear regression and bounded-error estimation. While most works focus on local optimization heuristics without global optimality guarantees or with guarantees valid only under restrictive conditions, the proposed approach always yields a solution with a certificate of global optimality. This approach relies on a branch-and-bound strategy for which we devise lower bounds that can be efficiently computed. In order to obtain scalable algorithms with respect to the number of data, we directly optimize the model parameters in a continuous optimization setting without involving integer variables. Numerical experiments show that the proposed algorithms offer a higher accuracy than convex relaxations with a reasonable computational burden for hybrid system identification. In addition, we discuss how bounded-error estimation is related to robust estimation in the presence of outliers and exact recovery under sparse noise, for which we also obtain promising numerical results.
Robustness of Maximum Correntropy Estimation Against Large Outliers
Chen, Badong, Xing, Lei, Zhao, Haiquan, Xu, Bin, Principe, Jose C.
The maximum correntropy criterion (MCC) has recently been successfully applied in robust regression, classification and adaptive filtering, where the correntropy is maximized instead of minimizing the well-known mean square error (MSE) to improve the robustness with respect to outliers (or impulsive noises). Considerable efforts have been devoted to develop various robust adaptive algorithms under MCC, but so far little insight has been gained as to how the optimal solution will be affected by outliers. In this work, we study this problem in the context of parameter estimation for a simple linear errors-in-variables (EIV) model where all variables are scalar. Under certain conditions, we derive an upper bound on the absolute value of the estimation error and show that the optimal solution under MCC can be very close to the true value of the unknown parameter even with outliers (whose values can be arbitrarily large) in both input and output variables. Illustrative examples are presented to verify and clarify the theory.
Adaptive Cardinality Estimation
Ivanov, Oleg, Bartunov, Sergey
In this paper we address cardinality estimation problem which is an important subproblem in query optimization. Query optimization is a part of every relational DBMS responsible for finding the best way of the execution for the given query. These ways are called plans. The execution time of different plans may differ by several orders, so query optimizer has a great influence on the whole DBMS performance. We consider cost-based query optimization approach as the most popular one. It was observed that cost-based optimization quality depends much on cardinality estimation quality. Cardinality of the plan node is the number of tuples returned by it. In the paper we propose a novel cardinality estimation approach with the use of machine learning methods. The main point of the approach is using query execution statistics of the previously executed queries to improve cardinality estimations. We called this approach adaptive cardinality estimation to reflect this point. The approach is general, flexible, and easy to implement. The experimental evaluation shows that this approach significantly increases the quality of cardinality estimation, and therefore increases the DBMS performance for some queries by several times or even by several dozens of times.
Decomposition Strategies for Constructive Preference Elicitation
Dragone, Paolo, Teso, Stefano, Kumar, Mohit, Passerini, Andrea
We tackle the problem of constructive preference elicitation, that is the problem of learning user preferences over very large decision problems, involving a combinatorial space of possible outcomes. In this setting, the suggested configuration is synthesized on-the-fly by solving a constrained optimization problem, while the preferences are learned itera tively by interacting with the user. Previous work has shown that Coactive Learning is a suitable method for learning user preferences in constructive scenarios. In Coactive Learning the user provides feedback to the algorithm in the form of an improvement to a suggested configuration. When the problem involves many decision variables and constraints, this type of interaction poses a significant cognitive burden on the user. We propose a decomposition technique for large preference-based decision problems relying exclusively on inference and feedback over partial configurations. This has the clear advantage of drastically reducing the user cognitive load. Additionally, part-wise inference can be (up to exponentially) less computationally demanding than inference over full configurations. We discuss the theoretical implications of working with parts and present promising empirical results on one synthetic and two realistic constructive problems.
On the ERM Principle with Networked Data
Wang, Yuanhong, Wang, Yuyi, Liu, Xingwu, Pu, Juhua
Networked data, in which every training example involves two objects and may share some common objects with others, is used in many machine learning tasks such as learning to rank and link prediction. A challenge of learning from networked examples is that target values are not known for some pairs of objects. In this case, neither the classical i.i.d.\ assumption nor techniques based on complete U-statistics can be used. Most existing theoretical results of this problem only deal with the classical empirical risk minimization (ERM) principle that always weights every example equally, but this strategy leads to unsatisfactory bounds. We consider general weighted ERM and show new universal risk bounds for this problem. These new bounds naturally define an optimization problem which leads to appropriate weights for networked examples. Though this optimization problem is not convex in general, we devise a new fully polynomial-time approximation scheme (FPTAS) to solve it.
Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems
In this paper, we consider a class of possibly nonconvex, nonsmooth and non-Lipschitz optimization problems arising in many contemporary applications such as machine learning, variable selection and image processing. To solve this class of problems, we propose a proximal gradient method with extrapolation and line search (PGels). This method is developed based on a special potential function and successfully incorporates both extrapolation and non-monotone line search, which are two simple and efficient accelerating techniques for the proximal gradient method. Thanks to the line search, this method allows more flexibilities in choosing the extrapolation parameters and updates them adaptively at each iteration if a certain line search criterion is not satisfied. Moreover, with proper choices of parameters, our PGels reduces to many existing algorithms. We also show that, under some mild conditions, our line search criterion is well defined and any cluster point of the sequence generated by PGels is a stationary point of our problem. In addition, by assuming the Kurdyka-${\L}$ojasiewicz exponent of the objective in our problem, we further analyze the local convergence rate of two special cases of PGels, including the widely used non-monotone proximal gradient method as one case. Finally, we conduct some numerical experiments for solving the $\ell_1$ regularized logistic regression problem and the $\ell_{1\text{-}2}$ regularized least squares problem. Our numerical results illustrate the efficiency of PGels and show the potential advantage of combining two accelerating techniques.