Optimization
A Data Efficient and Feasible Level Set Method for Stochastic Convex Optimization with Expectation Constraints
Lin, Qihang, Nadarajah, Selvaprabu, Soheili, Negar, Yang, Tianbao
Stochastic convex optimization problems with expectation constraints (SOECs) are encountered in statistics and machine learning, business, and engineering. In data-rich environments, the SOEC objective and constraints contain expectations defined with respect to large datasets. Therefore, efficient algorithms for solving such SOECs need to limit the fraction of data points that they use, which we refer to as algorithmic data complexity. Recent stochastic first order methods exhibit low data complexity when handling SOECs but guarantee near-feasibility and near-optimality only at convergence. These methods may thus return highly infeasible solutions when heuristically terminated, as is often the case, due to theoretical convergence criteria being highly conservative. This issue limits the use of first order methods in several applications where the SOEC constraints encode implementation requirements. We design a stochastic feasible level set method (SFLS) for SOECs that has low data complexity and emphasizes feasibility before convergence. Specifically, our level-set method solves a root-finding problem by calling a novel first order oracle that computes a stochastic upper bound on the level-set function by extending mirror descent and online validation techniques. We establish that SFLS maintains a high-probability feasible solution at each root-finding iteration and exhibits favorable iteration complexity compared to state-of-the-art deterministic feasible level set and stochastic subgradient methods. Numerical experiments on three diverse applications validate the low data complexity of SFLS relative to the former approach and highlight how SFLS finds feasible solutions with small optimality gaps significantly faster than the latter method.
How much data is sufficient to learn high-performing algorithms?
Balcan, Maria-Florina, DeBlasio, Dan, Dick, Travis, Kingsford, Carl, Sandholm, Tuomas, Vitercik, Ellen
Algorithms for scientific analysis typically have tunable parameters that significantly influence computational efficiency and solution quality. If a parameter setting leads to strong algorithmic performance on average over a set of typical problem instances, that parameter setting---ideally---will perform well in the future. However, if the set of typical problem instances is small, average performance will not generalize to future performance. This raises the question: how large should this set be? We answer this question for any algorithm satisfying an easy-to-describe, ubiquitous property: its performance is a piecewise-structured function of its parameters. We are the first to provide a unified sample complexity framework for algorithm parameter configuration; prior research followed case-by-case analyses. We present applications from diverse domains, including biology, political science, and economics.
On Convergence of Distributed Approximate Newton Methods: Globalization, Sharper Bounds and Beyond
The DANE algorithm is an approximate Newton method popularly used for communication-efficient distributed machine learning. Reasons for the interest in DANE include scalability and versatility. Convergence of DANE, however, can be tricky; its appealing convergence rate is only rigorous for quadratic objective, and for more general convex functions the known results are no stronger than those of the classic first-order methods. To remedy these drawbacks, we propose in this paper some new alternatives of DANE which are more suitable for analysis. We first introduce a simple variant of DANE equipped with backtracking line search, for which global asymptotic convergence and sharper local non-asymptotic convergence rate guarantees can be proved for both quadratic and non-quadratic strongly convex functions. Then we propose a heavy-ball method to accelerate the convergence of DANE, showing that nearly tight local rate of convergence can be established for strongly convex functions, and with proper modification of algorithm the same result applies globally to linear prediction models. Numerical evidence is provided to confirm the theoretical and practical advantages of our methods.
The Noise Collector for sparse recovery in high dimensions
Moscoso, Miguel, Novikov, Alexei, Papanicolaou, George, Tsogka, Chrysoula
The Noise Collector for sparse recovery in high dimensions Miguel Moscoso, Alexei Novikov โ , George Papanicolaou โก, Chrysoula Tsogka ยง August 14, 2019 Abstract The ability to detect sparse signals from noisy high-dimensional data is a top priority in modern science and engineering. A sparse solution of the linear system A ฯ b 0 can be found efficiently with an null 1-norm minimization approach if the data is noiseless. Detection of the signal's support from data corrupted by noise is still a challenging problem, especially if the level of noise must be estimated. We propose a new efficient approach that does not require any parameter estimation. We introduce the Noise Collector (NC) matrix C and solve an augmented system A ฯ C ฮท b 0 e, where e is the noise. We show that the l 1-norm minimal solution of the augmented system has zero false discovery rate for any level of noise and with probability that tends to one as the dimension of b 0 increases to infinity. We also obtain exact support recovery if the noise is not too large, and develop a Fast Noise Collector Algorithm which makes the computational cost of solving the augmented system comparable to that of the original one. Finally, we demonstrate the effectiveness of the method in applications to passive array imaging. In the noiseless case, ฯ can be found exactly by solving the optimization problem [9] ฯ arg min ฯ null ฯnull null 1, subject to A ฯ b, (2) provided the measurement matrix A R N K satisfies additional conditions, e.g., decoherence or restricted isometry properties [11, 4], and the solution vector ฯ has a small number M of nonzero components or degrees of freedom.
Review of Algorithms for Compressive Sensing of Images
We provide a comprehensive review of classical algorithms for compressive sensing of images, focused on Total variation methods, with a view to application in LiDAR systems. Our primary focus is providing a full review for beginners in the field, as well as simulating the kind of noise found in real LiDAR systems. To this end, we provide an overview of the theoretical background, a brief discussion of various considerations that come in to play in compressive sensing, and a standardized comparison of off-the-shelf methods, intended as a quick-start guide to choosing algorithms for compressive sensing applications.
ChemBO: Bayesian Optimization of Small Organic Molecules with Synthesizable Recommendations
Korovina, Ksenia, Xu, Sailun, Kandasamy, Kirthevasan, Neiswanger, Willie, Poczos, Barnabas, Schneider, Jeff, Xing, Eric P.
We describe ChemBO, a Bayesian Optimization framework for generating and optimizing organic molecules for desired molecular properties. This framework is useful in applications such as drug discovery, where an algorithm recommends new candidate molecules; these molecules first need to be synthesized and then tested for drug-like properties. The algorithm uses the results of past tests to recommend new ones so as to find good molecules efficiently. Most existing data-driven methods for this problem do not account for sample efficiency and/or fail to enforce realistic constraints on synthesizability. In this work, we explore existing kernels for molecules in the literature as well as propose a novel kernel which views a molecule as a graph. In ChemBO, we implement these kernels in a Gaussian process model. Then we explore the chemical space by traversing possible paths of molecular synthesis. Consequently, our approach provides a proposal synthesis path every time it recommends a new molecule to test, a crucial advantage when compared to existing methods. In our experiments, we demonstrate the efficacy of the proposed approach on several molecular optimization problems.
Simultaneous Clustering and Optimization for Evolving Datasets
Zhao, Yawei, Zhu, En, Liu, Xinwang, Tang, Chang, Guo, Deke, Yin, Jianping
For any i such that 1 i 6, A i represents an instance of the dataset, X i represents the corresponding optimization variable, v i represents a vertex of graph G, and e ij represents the edge connecting v i and v j. heuristic rules used in traditional clustering methods. A formulation of convex clustering was proposed in [13] by relaxing the formulation of k-means clustering. Subsequently, [15] and [16] provided several sufficient conditions for recovering the clustering membership theoretically . Other studies, e.g., [8], [17], focus on improving the efficiency of convex clustering. Although those previous studies attained great improvement of convex clustering for static datasets, they are unsuitable for handling evolving datasets due to a high computational cost. The method proposed in the paper reduces such computational cost and makes a good tradeoff between efficiency and accuracy .
Multi-node environment strategy for Parallel Deterministic Multi-Objective Fractal Decomposition
This paper deals with these problems by using a new decomposition-based algorithm called: "Fractal geometric decomposition base algorithm" (FDA). It is a deterministic metaheuristic developed to solve large-scale continuous optimization problems [5]. It can be noticed, that we call large scale problems those having the dimension greater than 1000. In this research, we are interested in using FDA to deal with MOPs because in the literature decomposition based algorithms have been with more less success applied to solve these problems, their main problem is related to their complexity. In this work, the goal is to deal with this complexity problem by keeping the same level of efficiency. FDA is based on "divide-and-conquer" paradigm where the sub-regions are hyperspheres rather than hypercubes on classical approaches. In order to identify the Pareto optimal solutions, we propose to extend FDA using the scalarization approach. We called the proposed algorithm Mo-FDA.
Mixed-Integer Optimization Approach to Learning Association Rules for Unplanned ICU Transfer
Chou, Chun-An, Cao, Qingtao, Weng, Shao-Jen, Tsai, Che-Hung
After admission to emergency department (ED), patients with critical illnesses are transferred to intensive care unit (ICU) due to unexpected clinical deterioration occurrence. Identifying such unplanned ICU transfers is urgently needed for medical physicians to achieve two-fold goals: improving critical care quality and preventing mortality. A priority task is to understand the crucial rationale behind diagnosis results of individual patients during stay in ED, which helps prepare for an early transfer to ICU. Most existing prediction studies were based on univariate analysis or multiple logistic regression to provide one-size-fit-all results. However, patient condition varying from case to case may not be accurately examined by the only judgment. In this study, we present a new decision tool using a mathematical optimization approach aiming to automatically discover rules associating diagnostic features with high-risk outcome (i.e., unplanned transfers) in different deterioration scenarios. We consider four mutually exclusive patient subgroups based on the principal reasons of ED visits: infections, cardiovascular/respiratory diseases, gastrointestinal diseases, and neurological/other diseases at a suburban teaching hospital. The analysis results demonstrate significant rules associated with unplanned transfer outcome for each subgroups and also show comparable prediction accuracy, compared to state-of-the-art machine learning methods while providing easy-to-interpret symptom-outcome information.
Bayesian Persuasion with Sequential Games
Celli, Andrea, Coniglio, Stefano, Gatti, Nicola
We study an information-structure design problem (a.k.a. persuasion) with a single sender and multiple receivers with actions of a priori unknown types, independently drawn from action-specific marginal distributions. As in the standard Bayesian persuasion model, the sender has access to additional information regarding the action types, which she can exploit when committing to a (noisy) signaling scheme through which she sends a private signal to each receiver. The novelty of our model is in considering the case where the receivers interact in a sequential game with imperfect information, with utilities depending on the game outcome and the realized action types. After formalizing the notions of ex ante and ex interim persuasiveness (which differ in the time at which the receivers commit to following the sender's signaling scheme), we investigate the continuous optimization problem of computing a signaling scheme which maximizes the sender's expected revenue. We show that computing an optimal ex ante persuasive signaling scheme is NP-hard when there are three or more receivers. In contrast with previous hardness results for ex interim persuasion, we show that, for games with two receivers, an optimal ex ante persuasive signaling scheme can be computed in polynomial time thanks to a novel algorithm based on the ellipsoid method which we propose.