Optimization
Entropy Search for Information-Efficient Global Optimization
Hennig, Philipp, Schuler, Christian J.
Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly adresses the decision problem of maximizing information gain from each evaluation.
Multi-stage Convex Relaxation for Feature Selection
A number of recent work studied the effectiveness of feature selection using Lasso. It is known that under the restricted isometry properties (RIP), Lasso does not generally lead to the exact recovery of the set of nonzero coefficients, due to the looseness of convex relaxation. This paper considers the feature selection property of nonconvex regularization, where the solution is given by a multi-stage convex relaxation scheme. Under appropriate conditions, we show that the local solution obtained by this procedure recovers the set of nonzero coefficients without suffering from the bias of Lasso relaxation, which complements parameter estimation results of this procedure.
Strong Equivalence of Qualitative Optimization Problems
Faber, Wolfgang, Truszczyลski, Mirosลaw, Woltran, Stefan
We introduce the framework of qualitative optimization problems (or, simply, optimization problems) to represent preference theories. The formalism uses separate modules to describe the space of outcomes to be compared (the generator) and the preferences on outcomes (the selector). We consider two types of optimization problems. They differ in the way the generator, which we model by a propositional theory, is interpreted: by the standard propositional logic semantics, and by the equilibrium-model (answer-set) semantics. Under the latter interpretation of generators, optimization problems directly generalize answer-set optimization programs proposed previously. We study strong equivalence of optimization problems, which guarantees their interchangeability within any larger context. We characterize several versions of strong equivalence obtained by restricting the class of optimization problems that can be used as extensions and establish the complexity of associated reasoning tasks. Understanding strong equivalence is essential for modular representation of optimization problems and rewriting techniques to simplify them without changing their inherent properties.
Information-Maximization Clustering based on Squared-Loss Mutual Information
Sugiyama, Masashi, Yamada, Makoto, Kimura, Manabu, Hachiya, Hirotaka
Information-maximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it only involves continuous optimization of model parameters, which is substantially easier to solve than discrete optimization of cluster assignments. However, existing methods still involve non-convex optimization problems, and therefore finding a good local optimal solution is not straightforward in practice. In this paper, we propose an alternative information-maximization clustering method based on a squared-loss variant of mutual information. This novel approach gives a clustering solution analytically in a computationally efficient way via kernel eigenvalue decomposition. Furthermore, we provide a practical model selection procedure that allows us to objectively optimize tuning parameters included in the kernel function. Through experiments, we demonstrate the usefulness of the proposed approach.
Rank Minimization over Finite Fields: Fundamental Limits and Coding-Theoretic Interpretations
Tan, Vincent Y. F., Balzano, Laura, Draper, Stark C.
This paper establishes information-theoretic limits in estimating a finite field low-rank matrix given random linear measurements of it. These linear measurements are obtained by taking inner products of the low-rank matrix with random sensing matrices. Necessary and sufficient conditions on the number of measurements required are provided. It is shown that these conditions are sharp and the minimum-rank decoder is asymptotically optimal. The reliability function of this decoder is also derived by appealing to de Caen's lower bound on the probability of a union. The sufficient condition also holds when the sensing matrices are sparse - a scenario that may be amenable to efficient decoding. More precisely, it is shown that if the n\times n-sensing matrices contain, on average, \Omega(nlog n) entries, the number of measurements required is the same as that when the sensing matrices are dense and contain entries drawn uniformly at random from the field. Analogies are drawn between the above results and rank-metric codes in the coding theory literature. In fact, we are also strongly motivated by understanding when minimum rank distance decoding of random rank-metric codes succeeds. To this end, we derive distance properties of equiprobable and sparse rank-metric codes. These distance properties provide a precise geometric interpretation of the fact that the sparse ensemble requires as few measurements as the dense one. Finally, we provide a non-exhaustive procedure to search for the unknown low-rank matrix.
Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization
Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive submodularity, generalizing submodular set functions to adaptive policies. We prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy. In addition to providing performance guarantees for both stochastic maximization and coverage, adaptive submodularity can be exploited to drastically speed up the greedy algorithm by using lazy evaluations. We illustrate the usefulness of the concept by giving several examples of adaptive submodular objectives arising in diverse AI applications including management of sensing resources, viral marketing and active learning. Proving adaptive submodularity for these problems allows us to recover existing results in these applications as special cases, improve approximation guarantees and handle natural generalizations.
On l_1 Mean and Variance Filtering
Wahlberg, Bo, Rojas, Cristian R., Annergren, Mariette
This paper addresses the problem of segmenting a time-series with respect to changes in the mean value or in the variance. The first case is when the time data is modeled as a sequence of independent and normal distributed random variables with unknown, possibly changing, mean value but fixed variance. The main assumption is that the mean value is piecewise constant in time, and the task is to estimate the change times and the mean values within the segments. The second case is when the mean value is constant, but the variance can change. The assumption is that the variance is piecewise constant in time, and we want to estimate change times and the variance values within the segments. To find solutions to these problems, we will study an l_1 regularized maximum likelihood method, related to the fused lasso method and l_1 trend filtering, where the parameters to be estimated are free to vary at each sample. To penalize variations in the estimated parameters, the l_1-norm of the time difference of the parameters is used as a regularization term. This idea is closely related to total variation denoising. The main contribution is that a convex formulation of this variance estimation problem, where the parametrization is based on the inverse of the variance, can be formulated as a certain l_1 mean estimation problem. This implies that results and methods for mean estimation can be applied to the challenging problem of variance segmentation/estimation.
Self-Avoiding Random Dynamics on Integer Complex Systems
Hamze, Firas, Wang, Ziyu, de Freitas, Nando
This paper introduces a new specialized algorithm for equilibrium Monte Carlo sampling of binary-valued systems, which allows for large moves in the state space. This is achieved by constructing self-avoiding walks (SAWs) in the state space. As a consequence, many bits are flipped in a single MCMC step. We name the algorithm SARDONICS, an acronym for Self-Avoiding Random Dynamics on Integer Complex Systems. The algorithm has several free parameters, but we show that Bayesian optimization can be used to automatically tune them. SARDONICS performs remarkably well in a broad number of sampling tasks: toroidal ferromagnetic and frustrated Ising models, 3D Ising models, restricted Boltzmann machines and chimera graphs arising in the design of quantum computers.
Optimization with Sparsity-Inducing Penalties
Bach, Francis, Jenatton, Rodolphe, Mairal, Julien, Obozinski, Guillaume
Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted $\ell_2$-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view.
Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization
Agarwal, Alekh, Bartlett, Peter L., Ravikumar, Pradeep, Wainwright, Martin J.
Relative to the large literature on upper bounds on complexity of convex optimization, lesser attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining an understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We improve upon known results and obtain tight minimax complexity estimates for various function classes.