Optimization
A Latent Variational Framework for Stochastic Optimization
This paper provides a unifying theoretical framework for stochastic optimization algorithms by means of a latent stochastic variational problem. Using techniques from stochastic control, the solution to the variational problem is shown to be equivalent to that of a Forward Backward Stochastic Differential Equation (FBSDE). By solving these equations, we recover a variety of existing adaptive stochastic gradient descent methods. This framework establishes a direct connection between stochastic optimization algorithms and a secondary Bayesian inference problem on gradients, where a prior measure on noisy gradient observations determine the resulting algorithm.
Efficient Profile Maximum Likelihood for Universal Symmetric Property Estimation
Charikar, Moses, Shiragur, Kirankumar, Sidford, Aaron
Estimating symmetric properties of a distribution, e.g. support size, coverage, entropy, distance to uniformity, are among the most fundamental problems in algorithmic statistics. While each of these properties have been studied extensively and separate optimal estimators are known for each, in striking recent work, Acharya et al. 2016 showed that there is a single estimator that is competitive for all symmetric properties. This work proved that computing the distribution that approximately maximizes \emph{profile likelihood (PML)}, i.e. the probability of observed frequency of frequencies, and returning the value of the property on this distribution is sample competitive with respect to a broad class of estimators of symmetric properties. Further, they showed that even computing an approximation of the PML suffices to achieve such a universal plug-in estimator. Unfortunately, prior to this work there was no known polynomial time algorithm to compute an approximate PML and it was open to obtain a polynomial time universal plug-in estimator through the use of approximate PML. In this paper we provide a algorithm (in number of samples) that, given $n$ samples from a distribution, computes an approximate PML distribution up to a multiplicative error of $\exp(n^{2/3} \mathrm{poly} \log(n))$ in time nearly linear in $n$. Generalizing work of Acharya et al. 2016 on the utility of approximate PML we show that our algorithm provides a nearly linear time universal plug-in estimator for all symmetric functions up to accuracy $\epsilon = \Omega(n^{-0.166})$. Further, we show how to extend our work to provide efficient polynomial-time algorithms for computing a $d$-dimensional generalization of PML (for constant $d$) that allows for universal plug-in estimation of symmetric relationships between distributions.
Data-driven preference learning methods for value-driven multiple criteria sorting with interacting criteria
Liu, Jiapeng, Kadzinski, Milosz, Liao, Xiuwu, Mao, Xiaoxin
The learning of predictive models for data-driven decision support has been a prevalent topic in many fields. However, construction of models that would capture interactions among input variables is a challenging task. In this paper, we present a new preference learning approach for multiple criteria sorting with potentially interacting criteria. It employs an additive piecewise-linear value function as the basic preference model, which is augmented with components for handling the interactions. To construct such a model from a given set of assignment examples concerning reference alternatives, we develop a convex quadratic programming model. Since its complexity does not depend on the number of training samples, the proposed approach is capable for dealing with data-intensive tasks. To improve the generalization of the constructed model on new instances and to overcome the problem of over-fitting, we employ the regularization techniques. We also propose a few novel methods for classifying non-reference alternatives in order to enhance the applicability of our approach to different datasets. The practical usefulness of the proposed method is demonstrated on a problem of parametric evaluation of research units, whereas its predictive performance is studied on several monotone learning datasets. The experimental results indicate that our approach compares favourably with the classical UTADIS method and the Choquet integral-based sorting model.
Things You May Not Know About Adversarial Example: A Black-box Adversarial Image Attack
Duan, Yuchao, Zhao, Zhe, Bu, Lei, Song, Fu
Numerous methods for crafting adversarial examples were proposed recently with high success rate. Most existing works normalize images into a continuous, real vector, domain firstly, and then craft adversarial examples in this domain. However, "adversarial" examples may become benign after de-normalizing them back into the discrete integer domain, known as the discretization problem. The discretization problem was mentioned in some work, but was underestimated and has received relatively little attention. In this work, we conduct the first comprehensive study of the discretization problem. We theoretically analyze 34 representative methods and empirically study 20 representative open source tools for crafting adversarial images. Our study reveals that almost all existing works suffer from the discretization problem and it is far more serious than originally thought. For instance, most black-box methods downgrade to white-box ones and methods having higher success rates drop down to lower high success rates, e.g., from 100% to 10%. This suggests that the discretization problem should be taken into account when crafting adversarial examples. As a first step towards addressing this problem, we propose a black-box method which reduces the adversarial example searching problem to a derivative-free optimization problem. Our method is able to craft `real' adversarial images by derivative-free search on the discrete integer domain. Experimental results show that our method achieves significantly higher success rate in terms of adversarial examples in the discrete integer domain than most other methods, no matter white-box or black-box. Moreover, our method is able to handle models that is non-differentiable and we successfully break the winner of NIPS 17 competition on defense with 95% success rate.
Multi-view Locality Low-rank Embedding for Dimension Reduction
Feng, Lin, Meng, Xiangzhu, Wang, Huibing
During the last decades, we have witnessed a surge of interests of learning a low-dimensional space with discriminative information from one single view. Even though most of them can achieve satisfactory performance in some certain situations, they fail to fully consider the information from multiple views which are highly relevant but sometimes look different from each other. Besides, correlations between features from multiple views always vary greatly, which challenges multi-view subspace learning. Therefore, how to learn an appropriate subspace which can maintain valuable information from multi-view features is of vital importance but challenging. To tackle this problem, this paper proposes a novel multi-view dimension reduction method named Multi-view Locality Low-rank Embedding for Dimension Reduction (MvL2E). MvL2E makes full use of correlations between multi-view features by adopting low-rank representations. Meanwhile, it aims to maintain the correlations and construct a suitable manifold space to capture the low-dimensional embedding for multi-view features. A centroid based scheme is designed to force multiple views to learn from each other. And an iterative alternating strategy is developed to obtain the optimal solution of MvL2E. The proposed method is evaluated on 5 benchmark datasets. Comprehensive experiments show that our proposed MvL2E can achieve comparable performance with previous approaches proposed in recent literatures.
Gaussian Process Learning via Fisher Scoring of Vecchia's Approximation
The Gaussian process model is an indispensible tool for the analysis of spatial and spatial-temporal datasets and has become increasingly popular as a general-purpose model for functions. Because of its high computational burden, researchers have devoted substantial effort to developing numerical approximations for Gaussian process computations. Much of the work focuses on efficient approximation of the likelihood function. Fast likelihood evaluations are crucial for optimization procedures that require many evaluations of the likelihood, such as the default Nelder-Mead algorithm (Nelder and Mead, 1965) in the R optim function. The likelihood must be repeatedly evaluated in MCMC algorithms as well. Compared to the amount of literature on efficient likelihood approximations, there has been considerably less development of techniques for numerically maximizing the likelihood (see Geoga et al. (2018) for one recent example). This article aims to address the disparity by providing: 1. Formulas for evaluating the gradient and Fisher information for Vecchia's likelihood approximation in a single pass through the data, so that the Fisher scoring algorithm can be applied. Fisher scoring is a modification of the Newton-Raphson optimization method, replacing the Hessian matrix with the Fisher information matrix.
MaxEntropy Pursuit Variational Inference
Egorov, Evgenii, Neklydov, Kirill, Kostoev, Ruslan, Burnaev, Evgeny
One of the core problems in variational inference is a choice of approximate posterior distribution. It is crucial to trade-off between efficient inference with simple families as mean-field models and accuracy of inference. We propose a variant of a greedy approximation of the posterior distribution with tractable base learners. Using Max-Entropy approach, we obtain a well-defined optimization problem. We demonstrate the ability of the method to capture complex multimodal posterior via continual learning setting for neural networks.
Practical Bayesian Optimization with Threshold-Guided Marginal Likelihood Maximization
We propose a practical Bayesian optimization method, of which the surrogate function is Gaussian process regression with threshold-guided marginal likelihood maximization. Because Bayesian optimization consumes much time in finding optimal free parameters of Gaussian process regression, mitigating a time complexity of this step is critical to speed up Bayesian optimization. For this reason, we propose a simple, but straightforward Bayesian optimization method, assuming a reasonable condition, which is observed in many practical examples. Our experimental results confirm that our method is effective to reduce the execution time. All implementations are available in our repository.
Approximation of the objective insensitivity regions using Hierarchic Memetic Strategy coupled with Covariance Matrix Adaptation Evolutionary Strategy
Sawicki, Jakub, Smołka, Maciej, Łoś, Marcin, Schaefer, Robert
One of the most challenging types of ill-posedness in global optimization is the presence of insensitivity regions in design parameter space, so the identification of their shape will be crucial, if ill-posedness is irrecoverable. Such problems may be solved using global stochastic search followed by post-processing of a local sample and a local objective approximation. We propose a new approach of this type composed of Hierarchic Memetic Strategy (HMS) powered by the Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) well-known as an effective, self-adaptable stochastic optimization algorithm and we leverage the distribution density knowledge it accumulates to better identify and separate insensitivity regions. The results of benchmarks prove that the improved HMS-CMA-ES strategy is effective in both the total computational cost and the accuracy of insensitivity region approximation. The reference data for the tests was obtained by means of a well-known effective strategy of multimodal stochastic optimization called the Niching Evolutionary Algorithm 2 (NEA2), that also uses CMA-ES as a component.
MOBA: A multi-objective bounded-abstention model for two-class cost-sensitive problems
Abstaining classifiers have been widely used in cost-sensitive applications to avoid ambiguous classification and reduce the cost of misclassification. Previous abstaining classification models rely on cost information, such as a cost matrix or cost ratio. However, it is difficult to obtain or estimate costs in practical applications. Furthermore, these abstention models are typically restricted to a single optimization metric, which may not be the expected indicator when evaluating classification performance. To overcome such problems, a multi-objective bounded-abstention (MOBA) model is proposed to optimize essential metrics. Specifically, the MOBA model minimizes the error rate of each class under class-dependent abstention constraints. The MOBA model is then solved using the non-dominated sorting genetic algorithm II, which is a popular evolutionary multi-objective optimization algorithm. A set of Pareto-optimal solutions will be generated and the best one can be selected according to provided conditions (whether costs are known) or performance demands (e.g., obtaining a high accuracy, F-measure, and etc). Hence, the MOBA model is robust towards variations in the conditions and requirements. Compared to state-of-the-art abstention models, MOBA achieves lower expected costs when cost information is considered, and better performance-abstention trade-offs when it is not.