Optimization
Supersparse Linear Integer Models for Interpretable Classification
Ustun, Berk, Tracà, Stefano, Rudin, Cynthia
Scoring systems are classification models that only require users to add, subtract and multiply a few meaningful numbers to make a prediction. These models are often used because they are practical and interpretable. In this paper, we introduce an off-the-shelf tool to create scoring systems that both accurate and interpretable, known as a Supersparse Linear Integer Model (SLIM). SLIM is a discrete optimization problem that minimizes the 0-1 loss to encourage a high level of accuracy, regularizes the L0-norm to encourage a high level of sparsity, and constrains coefficients to a set of interpretable values. We illustrate the practical and interpretable nature of SLIM scoring systems through applications in medicine and criminology, and show that they are are accurate and sparse in comparison to state-of-the-art classification models using numerical experiments.
First Order Methods for Robust Non-negative Matrix Factorization for Large Scale Noisy Data
Liu, Jason Gejie, Aeron, Shuchin
Nonnegative matrix factorization (NMF) has been shown to be identifiable under the separability assumption, under which all the columns(or rows) of the input data matrix belong to the convex cone generated by only a few of these columns(or rows) [1]. In real applications, however, such separability assumption is hard to satisfy. Following [4] and [5], in this paper, we look at the Linear Programming (LP) based reformulation to locate the extreme rays of the convex cone but in a noisy setting. Furthermore, in order to deal with the large scale data, we employ First-Order Methods (FOM) to mitigate the computational complexity of LP, which primarily results from a large number of constraints. We show the performance of the algorithm on real and synthetic data sets.
CUR Algorithm with Incomplete Matrix Observation
CUR matrix decomposition is a randomized algorithm that can efficiently compute the low rank approximation for a given rectangle matrix. One limitation with the existing CUR algorithms is that they require an access to the full matrix A for computing U. In this work, we aim to alleviate this limitation. In particular, we assume that besides having an access to randomly sampled d rows and d columns from A, we only observe a subset of randomly sampled entries from A. Our goal is to develop a low rank approximation algorithm, similar to CUR, based on (i) randomly sampled rows and columns from A, and (ii) randomly sampled entries from A. The proposed algorithm is able to perfectly recover the target matrix A with only O(rn log n) number of observed entries. In addition, instead of having to solve an optimization problem involved trace norm regularization, the proposed algorithm only needs to solve a standard regression problem. Finally, unlike most matrix completion theories that hold only when the target matrix is of low rank, we show a strong guarantee for the proposed algorithm even when the target matrix is not low rank.
Bayesian Optimization with Unknown Constraints
Gelbart, Michael A., Snoek, Jasper, Adams, Ryan P.
Bayesian optimization (Mockus et al., 1978) is a method for performing global optimization of unknown "black box" objectives that is particularly appropriate when objective function evaluations are expensive (in any sense, such as time or money). For example, consider a food company trying to design a low-calorie variant of a popular cookie. In this case, the design space is the space of possible recipes and might include several key parameters such as quantities of various ingredients and baking times. Each evaluation of a recipe entails computing (or perhaps actually measuring) the number of calories in the proposed cookie. Bayesian optimization can be used to propose new candidate recipes such that good results are found with few evaluations. Now suppose the company also wants to ensure the taste of the cookie is not compromised when calories are reduced. Therefore, for each proposed low-calorie recipe, they perform a taste test with sample customers. Because different people, or the same people at different times, have differing opinions about the taste of cookies, the company decides to require that at least 95% of test subjects must like the new cookie.
Mechanisms for Fair Allocation Problems: No-Punishment Payment Rules in Verifiable Settings
Mechanism design is considered in the context of fair allocations of indivisible goods with monetary compensation, by focusing on problems where agents' declarations on allocated goods can be verified before payments are performed. A setting is considered where verification might be subject to errors, so that payments have to be awarded under the presumption of innocence, as incorrect declared values do not necessarily mean manipulation attempts by the agents. Within this setting, a mechanism is designed that is shown to be truthful, efficient, and budget-balanced. Moreover, agents' utilities are fairly determined by the Shapley value of suitable coalitional games, and enjoy highly desirable properties such as equal treatment of equals, envy-freeness, and a stronger one called individual-optimality. In particular, the latter property guarantees that, for every agent, her/his utility is the maximum possible one over any alternative optimal allocation. The computational complexity of the proposed mechanism is also studied. It turns out that it is #P-complete so that, to deal with applications with many agents involved, two polynomial-time randomized variants are also proposed: one that is still truthful and efficient, and which is approximately budget-balanced with high probability, and another one that is truthful in expectation, while still budget-balanced and efficient.
Proximal Newton-type methods for minimizing composite functions
Lee, Jason D., Sun, Yuekai, Saunders, Michael A.
We generalize Newton-type methods for minimizing smooth functions to handle a sum of two convex functions: a smooth function and a nonsmooth function with a simple proximal mapping. We show that the resulting proximal Newton-type methods inherit the desirable convergence behavior of Newton-type methods for minimizing smooth functions, even when search directions are computed inexactly. Many popular methods tailored to problems arising in bioinformatics, signal processing, and statistical learning are special cases of proximal Newton-type methods, and our analysis yields new convergence results for some of these methods.
Test Set Selection using Active Information Acquisition for Predictive Models
Chaudhari, Sneha, Dayama, Pankaj, Pandit, Vinayaka, Bhattacharya, Indrajit
In this paper, we consider active information acquisition when the prediction model is meant to be applied on a targeted subset of the population. The goal is to label a pre-specified fraction of customers in the target or test set by iteratively querying for information from the non-target or training set. The number of queries is limited by an overall budget. Arising in the context of two rather disparate applications- banking and medical diagnosis, we pose the active information acquisition problem as a constrained optimization problem. We propose two greedy iterative algorithms for solving the above problem. We conduct experiments with synthetic data and compare results of our proposed algorithms with few other baseline approaches. The experimental results show that our proposed approaches perform better than the baseline schemes.
On Combining Machine Learning with Decision Making
Tulabandhula, Theja, Rudin, Cynthia
Mach Learn manuscript No. (will be inserted by the editor) Abstract We present a new application and covering number bound for the framework of "Machine Learning with Operational Costs (MLOC)," which is an exploratory form of decision theory. The MLOC framework incorporates knowledge about how a predictive model will be used for a subsequent task, thus combining machine learning with the decision that is made afterwards. In this work, we use the MLOC framework to study a problem that has implications for power grid reliability and maintenance, called the Machine Learning and Traveling Repairman Problem (ML&TRP). The goal of the ML&TRP is to determine a route for a "repair crew," which repairs nodes on a graph. The repair crew aims to minimize the cost of failures at the nodes, but as in many real situations, the failure probabilities are not known and must be estimated. The MLOC framework allows us to understand how this uncertainty influences the repair route. Keywords decision theory · generalization bound · constrained linear function classes · covering numbers · traveling repairman · mixed-integer programming 1 Introduction In many domains, it is essential to understand how uncertainty in predictions influences decision-making. Funding for Theja Tulabandhula was provided by a Fulbright Fellowship and Xerox Fellowship. Cynthia Rudin's work on this project was funded in part by Con Edison, by the MIT Energy Initiative Seed Fund, and NSF grant IIS-1053407. The new framework of Machine Learning with Operational Costs (MLOC) (Tulabandhula and Rudin, 2013) provides a mechanism to do this, and is a type of exploratory decision theory. Where usual decision theories provide a single policy that minimizes expected costs, the MLOC framework is able to produce a range of reasonable policies that span the full set of reasonable costs. To do this, the operational cost becomes a regularization term within the machine learning model, and adjusting the regularization constant allows us to explore solutions for all reasonable costs. This gives decision makers a way to understand the uncertainty in their predictive model in terms of something they can grasp - uncertainty in the cost to solve the problem. The MLOC framework can also be used in another way, namely to incorporate prior knowledge about the cost to produce a better predictive model.
The SAT-UNSAT transition in the adversarial SAT problem
Bardoscia, Marco, Nagaj, Daniel, Scardicchio, Antonello
The study of random ensembles of decision problems has grown into a fertile field of investigation where the methods of statistical Physics have found applications to the theory (and practice) of hard combinatorial problems. This resulted in a wealth of intuition on the nature of the typical complexity of hard decision problems and in a new, efficient family of algorithms [1, 2]. One such problem is a random ensemble of satisfiability (in short SAT), where boolean formulas are generated in a random way and tested for a solution. If the formula is restricted to be of the form of a conjunction of an arbitrary number of clauses, and each clause is the logical disjunction of K variables, the problem is denoted by K-SAT. The ensemble is determined once the number of clauses per variable is fixed. As this ratio is increased the formulas go from being typically satisfiable to being typically unsatisfiable [2-4]. This is the SAT-UNSAT phase transition. Recent progress in the study of quantum decision problems [5] lead to the definition of the quantum generalisation of K-SAT (we call it K-QSAT) [6].
Fast methods for denoising matrix completion formulations, with applications to robust seismic data interpolation
Aravkin, Aleksandr Y., Kumar, Rajiv, Mansour, Hassan, Recht, Ben, Herrmann, Felix J.
Recent SVD-free matrix factorization formulations have enabled rank minimization for systems with millions of rows and columns, paving the way for matrix completion in extremely large-scale applications, such as seismic data interpolation. In this paper, we consider matrix completion formulations designed to hit a target data-fitting error level provided by the user, and propose an algorithm called LR-BPDN that is able to exploit factorized formulations to solve the corresponding optimization problem. Since practitioners typically have strong prior knowledge about target error level, this innovation makes it easy to apply the algorithm in practice, leaving only the factor rank to be determined. Within the established framework, we propose two extensions that are highly relevant to solving practical challenges of data interpolation. First, we propose a weighted extension that allows known subspace information to improve the results of matrix completion formulations. We show how this weighting can be used in the context of frequency continuation, an essential aspect to seismic data interpolation. Second, we propose matrix completion formulations that are robust to large measurement errors in the available data. We illustrate the advantages of LR-BPDN on the collaborative filtering problem using the MovieLens 1M, 10M, and Netflix 100M datasets. Then, we use the new method, along with its robust and subspace re-weighted extensions, to obtain high-quality reconstructions for large scale seismic interpolation problems with real data, even in the presence of data contamination.