plane algorithm
MOSS: Multi-Objective Optimization for Stable Rule Sets
We present MOSS, a multi-objective optimization framework for constructing stable sets of decision rules. MOSS incorporates three important criteria for interpretability: sparsity, accuracy, and stability, into a single multi-objective optimization framework. Importantly, MOSS allows a practitioner to rapidly evaluate the trade-off between accuracy and stability in sparse rule sets in order to select an appropriate model. We develop a specialized cutting plane algorithm in our framework to rapidly compute the Pareto frontier between these two objectives, and our algorithm scales to problem instances beyond the capabilities of commercial optimization solvers. Our experiments show that MOSS outperforms state-of-the-art rule ensembles in terms of both predictive performance and stability.
A cutting plane algorithm for globally solving low dimensional k-means clustering problems
Ryner, Martin, Kronqvist, Jan, Karlsson, Johan
Clustering is one of the most fundamental tools in data science and machine learning, and k-means clustering is one of the most common such methods. There is a variety of approximate algorithms for the k-means problem, but computing the globally optimal solution is in general NP-hard. In this paper we consider the k-means problem for instances with low dimensional data and formulate it as a structured concave assignment problem. This allows us to exploit the low dimensional structure and solve the problem to global optimality within reasonable time for large data sets with several clusters. The method builds on iteratively solving a small concave problem and a large linear programming problem. This gives a sequence of feasible solutions along with bounds which we show converges to zero optimality gap. The paper combines methods from global optimization theory to accelerate the procedure, and we provide numerical results on their performance.
A Probabilistic Forecast-Driven Strategy for a Risk-Aware Participation in the Capacity Firming Market
Dumas, Jonathan, Cointe, Colin, Wehenkel, Antoine, Sutera, Antonio, Fettweis, Xavier, Cornรฉlusse, Bertrand
This paper addresses the energy management of a grid-connected renewable generation plant coupled with a battery energy storage device in the capacity firming market, designed to promote renewable power generation facilities in small non-interconnected grids. A recently developed deep learning model known as normalizing flows is used to generate quantile forecasts of renewable generation. They provide a general mechanism for defining expressive probability distributions, only requiring the specification of a base distribution and a series of bijective transformations. Then, a probabilistic forecast-driven strategy is designed, modeled as a min-max-min robust optimization problem with recourse, and solved using a Benders decomposition. The convergence is improved by building an initial set of cuts derived from domain knowledge. Robust optimization models the generation randomness using an uncertainty set that includes the worst-case generation scenario and protects this scenario under the minimal increment of costs. This approach improves the results over a deterministic approach with nominal point forecasts by finding a trade-off between conservative and risk-seeking policies. Finally, a dynamic risk-averse parameters selection strategy based on the quantile forecasts distribution provides an additional gain. The case study uses the photovoltaic generation monitored on-site at the University of Li\`ege (ULi\`ege), Belgium.
Stochastic Cutting Planes for Data-Driven Optimization
Bertsimas, Dimitris, Li, Michael Lingzhi
We introduce a stochastic version of the cutting-plane method for a large class of data-driven Mixed-Integer Nonlinear Optimization (MINLO) problems. We show that under very weak assumptions the stochastic algorithm is able to converge to an $\epsilon$-optimal solution with high probability. Numerical experiments on several problems show that stochastic cutting planes is able to deliver a multiple order-of-magnitude speedup compared to the standard cutting-plane method. We further experimentally explore the lower limits of sampling for stochastic cutting planes and show that for many problems, a sampling size of $O(\sqrt[3]{n})$ appears to be sufficient for high quality solutions.
Learning Optimized Risk Scores
Risk scores are simple classification models that let users make quick risk predictions by adding and subtracting a few small numbers. These models are widely used in medicine and criminal justice, but are difficult to learn from data because they need to be calibrated, sparse, use small integer coefficients and obey application-specific constraints. In this paper, we present a new machine learning approach to learn risk scores. We formulate the risk score problem as a mixed integer nonlinear program, and present a new cutting plane algorithm for non-convex settings to efficiently recover its optimal solution. We improve our algorithm with specialized techniques to generate feasible solutions, narrow the optimality gap, and reduce data-related computation. Our approach can fit risk scores in a way that scales linearly in the number of samples, provides a certificate of optimality, and obeys real-world constraints without parameter tuning or post-processing. We illustrate the performance benefits of this approach through an extensive set of numerical experiments, where we compare risk scores built using our approach to those built using heuristic approaches. We also discuss the practical benefits of our approach through an application where we build a customized risk score for ICU seizure prediction in collaboration with the Massachusetts General Hospital.
Sparse High-Dimensional Regression: Exact Scalable Algorithms and Phase Transitions
Bertsimas, Dimitris, Van Parys, Bart
We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solve to provable optimality the sparse regression problem for sample sizes n and number of regressors p in the 100,000s, that is two orders of magnitude better than the current state of the art, in seconds. The ability to solve the problem for very high dimensions allows us to observe new phase transition phenomena. Contrary to traditional complexity theory which suggests that the difficulty of a problem increases with problem size, the sparse regression problem has the property that as the number of samples $n$ increases the problem becomes easier in that the solution recovers 100% of the true signal, and our approach solves the problem extremely fast (in fact faster than Lasso), while for small number of samples n, our approach takes a larger amount of time to solve the problem, but importantly the optimal solution provides a statistically more relevant regressor. We argue that our exact sparse regression approach presents a superior alternative over heuristic methods available at present.
Bayesian Network Learning via Topological Order
Park, Young Woong, Klabjan, Diego
We propose a mixed integer programming (MIP) model and iterative algorithms based on topological orders to solve optimization problems with acyclic constraints on a directed graph. The proposed MIP model has a significantly lower number of constraints compared to popular MIP models based on cycle elimination constraints and triangular inequalities. The proposed iterative algorithms use gradient descent and iterative reordering approaches, respectively, for searching topological orders. A computational experiment is presented for the Gaussian Bayesian network learning problem, an optimization problem minimizing the sum of squared errors of regression models with L1 penalty over a feature network with application of gene network inference in bioinformatics.