Optimization
Short-and-Sparse Deconvolution -- A Geometric Approach
Lau, Yenson, Qu, Qing, Kuo, Han-Wen, Zhou, Pengcheng, Zhang, Yuqian, Wright, John
Short-and-sparse deconvolution (SaSD) is the problem of extracting localized, recurring motifs in signals with spatial or temporal structure. Variants of this problem arise in applications such as image deblurring, microscopy, neural spike sorting, and more. The problem is challenging in both theory and practice, as natural optimization formulations are nonconvex. Moreover, practical deconvolution problems involve smooth motifs (kernels) whose spectra decay rapidly, resulting in poor conditioning and numerical challenges. This paper is motivated by recent theoretical advances, which characterize the optimization landscape of a particular nonconvex formulation of SaSD. This is used to derive a $provable$ algorithm which exactly solves certain non-practical instances of the SaSD problem. We leverage the key ideas from this theory (sphere constraints, data-driven initialization) to develop a $practical$ algorithm, which performs well on data arising from a range of application areas. We highlight key additional challenges posed by the ill-conditioning of real SaSD problems, and suggest heuristics (acceleration, continuation, reweighting) to mitigate them. Experiments demonstrate both the performance and generality of the proposed method.
Multi-Objective Automatic Machine Learning with AutoxgboostMC
Pfisterer, Florian, Coors, Stefan, Thomas, Janek, Bischl, Bernd
AutoML systems are currently rising in popularity, as they can build powerful models without human oversight. They often combine techniques from many different sub-fields of machine learning in order to find a model or set of models that optimize a user-supplied criterion, such as predictive performance. The ultimate goal of such systems is to reduce the amount of time spent on menial tasks, or tasks that can be solved better by algorithms while leaving decisions that require human intelligence to the end-user. In recent years, the importance of other criteria, such as fairness and interpretability, and many others have become more and more apparent. Current AutoML frameworks either do not allow to optimize such secondary criteria or only do so by limiting the system's choice of models and preprocessing steps. We propose to optimize additional criteria defined by the user directly to guide the search towards an optimal machine learning pipeline. In order to demonstrate the need and usefulness of our approach, we provide a simple multi-criteria AutoML system and showcase an exemplary application.
Estimation of Individualized Decision Rules Based on an Optimized Covariate-Dependent Equivalent of Random Outcomes
Qi, Zhengling, Cui, Ying, Liu, Yufeng, Pang, Jong-Shi
Recent exploration of optimal individualized decision rules (IDRs) for patients in precision medicine has attracted a lot of attention due to the heterogeneous responses of patients to different treatments. In the existing literature of precision medicine, an optimal IDR is defined as a decision function mapping from the patients' covariate space into the treatment space that maximizes the expected outcome of each individual. Motivated by the concept of Optimized Certainty Equivalent (OCE) introduced originally in \cite{ben1986expected} that includes the popular conditional-value-of risk (CVaR) \cite{rockafellar2000optimization}, we propose a decision-rule based optimized covariates dependent equivalent (CDE) for individualized decision making problems. Our proposed IDR-CDE broadens the existing expected-mean outcome framework in precision medicine and enriches the previous concept of the OCE. Numerical experiments demonstrate that our overall approach outperforms existing methods in estimating optimal IDRs under heavy-tail distributions of the data.
A novel active learning-based Gaussian process metamodelling strategy for estimating the full probability distribution in forward UQ analysis
A novel active learning-based Gaussian process metamodelling strategy for estimating the full probability distribution in forward UQ analysis Ziqi Wang a, Marco Broccardo b a Earthquake Engineering Research and Test Center, Guangzhou University, China b Swiss Seismological Service, SED, ETH Z urich, SwitzerlandAbstract This paper proposes an active learning-based Gaussian process (AL-GP) metamodelling method to estimate the cumulative as well as complementary cumulative distribution function (CDF/CCDF) for forward uncertainty quantification (UQ) problems. Within the field of UQ, previous studies focused on developing AL-GP approaches for reliability (rare event probability) analysis of expensive black-box solvers. A naive iteration of these algorithms with respect to different CDF/CCDF threshold values would yield a discretized CDF/CCDF. However, this approach inevitably leads to a trade off between accuracy and computational efficiency since both depend (in opposite way) on the selected discretization. In this study, a specialized error measure and a learning function are developed such that the resulting AL-GP method is able to efficiently estimate the CDF/CCDF for a specified range of interest without an explicit dependency on discretization. Particularly, the proposed AL-GP method is able to simultaneously provide accurate CDF and CCDF estimation in their median-low probability regions. Three numerical examples are introduced to test and verify the proposed method. Introduction In a broad sense, uncertainty quantification (UQ) refers to the theory and practice to obtain quantitative understanding on the influences of uncertainties present within computational or real physical models. There are intrinsic connections between various branches of UQ, and attempts are made to develop unified UQ frameworks [4][5]. This study focuses on a central problem in forward UQ problems, the estimation of distribution function, i.e. cumulative and complementary cumulative distribution function (CDF/CCDF).
An Introduction to Advanced Machine Learning : Meta Learning Algorithms, Applications and Promises
Mohammadi, Farid Ghareh, Amini, M. Hadi, Arabnia, Hamid R.
In [1, 2], we have explored the theoretical aspects of feature extraction optimization processes for solving largescale problems and overcoming machine learning limitations. Majority of optimization algorithms that have been introduced in [1, 2] guarantee the optimal performance of supervised learning, given offline and discrete data, to deal with curse of dimensionality (CoD) problem. These algorithms, however, are not tailored for solving emerging learning problems. One of the important issues caused by online data is lack of sufficient samples per class. Further, traditional machine learning algorithms cannot achieve accurate training based on limited distributed data, as data has proliferated and dispersed significantly. Machine learning employs a strict model or embedded engine to train and predict which still fails to learn unseen classes and sufficiently use online data. In this chapter, we introduce these challenges elaborately. We further investigate Meta-Learning (MTL) algorithm, and their application and promises to solve the emerging problems by answering how autonomous agents can learn to learn?.
Stress-Plus-X (SPX) Graph Layout
Devkota, Sabin, Ahmed, Reyan, De Luca, Felice, Isaacs, Katherine E., Kobourov, Stephen
Stress, edge crossings, and crossing angles play an important role in the quality and readability of graph drawings. Most standard graph drawing algorithms optimize one of these criteria which may lead to layouts that are deficient in other criteria. We introduce an optimization framework, Stress-Plus-X (SPX), that simultaneously optimizes stress together with several other criteria: edge crossings, minimum crossing angle, and upwardness (for directed acyclic graphs). SPX achieves results that are close to the state-of-the-art algorithms that optimize these metrics individually. SPX is flexible and extensible and can optimize a subset or all of these criteria simultaneously. Our experimental analysis shows that our joint optimization approach is successful in drawing graphs with good performance across readability criteria.
KL property of exponent $1/2$ of $\ell_{2,0}$-norm and DC regularized factorizations for low-rank matrix recovery
Bi, Shujun, Tao, Ting, Pan, Shaohua
This paper is concerned with the factorization form of the rank regularized loss minimization problem. To cater for the scenario in which only a coarse estimation is available for the rank of the true matrix, an $\ell_{2,0}$-norm regularized term is added to the factored loss function to reduce the rank adaptively; and account for the ambiguities in the factorization, a balanced term is then introduced. For the least squares loss, under a restricted condition number assumption on the sampling operator, we establish the KL property of exponent $1/2$ of the nonsmooth factored composite function and its equivalent DC reformulations in the set of their global minimizers. We also confirm the theoretical findings by applying a proximal linearized alternating minimization method to the regularized factorizations.
Pareto-optimal data compression for binary classification tasks
The goal of lossy data compression is to reduce the storage cost of a data set $X$ while retaining as much information as possible about something ($Y$) that you care about. For example, what aspects of an image $X$ contain the most information about whether it depicts a cat? Mathematically, this corresponds to finding a mapping $X\to Z\equiv f(X)$ that maximizes the mutual information $I(Z,Y)$ while the entropy $H(Z)$ is kept below some fixed threshold. We present a method for mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retained entropy and class information. We first show how a random variable $X$ (an image, say) drawn from a class $Y\in\{1,...,n\}$ can be distilled into a vector $W=f(X)\in \mathbb{R}^{n-1}$ losslessly, so that $I(W,Y)=I(X,Y)$; for example, for a binary classification task of cats and dogs, each image $X$ is mapped into a single real number $W$ retaining all information that helps distinguish cats from dogs. For the $n=2$ case of binary classification, we then show how $W$ can be further compressed into a discrete variable $Z=g_\beta(W)\in\{1,...,m_\beta\}$ by binning $W$ into $m_\beta$ bins, in such a way that varying the parameter $\beta$ sweeps out the full Pareto frontier, solving a generalization of the Discrete Information Bottleneck (DIB) problem. We argue that the most interesting points on this frontier are "corners" maximizing $I(Z,Y)$ for a fixed number of bins $m=2,3...$ which can be conveniently be found without multiobjective optimization. We apply this method to the CIFAR-10, MNIST and Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoretically optimal image clustering algorithm.
Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning
Kuhn, Daniel, Esfahani, Peyman Mohajerin, Nguyen, Viet Anh, Shafieezadeh-Abadeh, Soroosh
Many decision problems in science, engineering and economics are affected by uncertain parameters whose distribution is only indirectly observable through samples. The goal of data-driven decision-making is to learn a decision from finitely many training samples that will perform well on unseen test samples. This learning task is difficult even if all training and test samples are drawn from the same distribution---especially if the dimension of the uncertainty is large relative to the training sample size. Wasserstein distributionally robust optimization seeks data-driven decisions that perform well under the most adverse distribution within a certain Wasserstein distance from a nominal distribution constructed from the training samples. In this tutorial we will argue that this approach has many conceptual and computational benefits. Most prominently, the optimal decisions can often be computed by solving tractable convex optimization problems, and they enjoy rigorous out-of-sample and asymptotic consistency guarantees. We will also show that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning and motivates new approaches for fundamental learning tasks such as classification, regression, maximum likelihood estimation or minimum mean square error estimation, among others.
Quantum algorithms for Second-Order Cone Programming and Support Vector Machines
Kerenidis, Iordanis, Prakash, Anupam, Szilágyi, Dániel
Convex optimization is one of the central areas of study in computer science and mathematical optimization. The reason for the great importance of convex optimization is twofold. Firstly, starting with the seminal works of Khachiyan [25] and Karmarkar [18], efficient algorithms have been developed for a large family of convex optimization problems over the last few decades. Secondly, convex optimization has many real world applications and many optimization problems that arise in practice can be reduced to convex optimization [8]. There are three main classes of structured convex optimization problems: linear programs (LP), semidefinite programs (SDP), and second-order conic programs (SOCP). The fastest (classical) algorithms for these problems belong to the family of interior-point methods (IPM). Interior point methods are iterative algorithms where the main computation in each step is the solution of a system of linear equations whose size depends on the dimension of the optimization problem. The size of structured optimization problems that can be solved in practice is therefore limited by the efficiency of linear system solvers - on a single computer, most open-source and commercial solvers can handle dense problems with up to tens of thousands of constraints and variables, or sparse problems with the same number of nonzero entries [30, 31]. In recent years, there has been a tremendous interest in quantum linear algebra algorithms following the breakthrough algorithm of Harrow, Hassidim and Lloyd [16].