Optimization
Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach
Duchi, John, Glynn, Peter, Namkoong, Hongseok
We study statistical inference and robust solution methods for stochastic optimization problems, focusing on giving calibrated and adaptive confidence intervals for optimal values and solutions for a range of stochastic problems. As part of this, we develop a generalized empirical likelihood framework---based on distributional uncertainty sets constructed from nonparametric $f$-divergence balls---for Hadamard differentiable functionals, and in particular, stochastic optimization problems. As consequences of this theory, we provide principled methods of choosing distributional uncertainty regions so as to provide calibrated one- and two-sided confidence intervals. We also give an asymptotic expansion for our distributionally robust formulation, showing how robustification regularizes problems by their variance. Finally, we show that optimizers of the distributionally robust formulations we study enjoy (essentially) the same consistency properties as those in classical sample average approximations.
Sparsity-driven weighted ensemble classifier
รzgรผr, Atilla, Erdem, Hamit, Nar, Fatih
In this letter, a novel weighted ensemble classifier is proposed that improves classification accuracy and minimizes the number of classifiers. Ensemble weight finding problem is modeled as a cost function with following terms: (a) a data fidelity term aiming to decrease misclassification rate, (b) a sparsity term aiming to decrease the number of classifiers, and (c) a non-negativity constraint on the weights of the classifiers. The proposed cost function is a non-convex and hard to solve; thus, convex relaxation techniques and novel approximations are employed to obtain a numerically efficient solution. The proposed method achieves better or similar performance compared to state-of-the art classifier ensemble methods, while using lower number of classifiers.
Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm
Huang, Kejun, Fu, Xiao, Sidiropoulos, Nicholas D.
In topic modeling, many algorithms that guarantee identifiability of the topics have been developed under the premise that there exist anchor words -- i.e., words that only appear (with positive probability) in one topic. Follow-up work has resorted to three or higher-order statistics of the data corpus to relax the anchor word assumption. Reliable estimates of higher-order statistics are hard to obtain, however, and the identification of topics under those models hinges on uncorrelatedness of the topics, which can be unrealistic. This paper revisits topic modeling based on second-order moments, and proposes an anchor-free topic mining framework. The proposed approach guarantees the identification of the topics under a much milder condition compared to the anchor-word assumption, thereby exhibiting much better robustness in practice. The associated algorithm only involves one eigen-decomposition and a few small linear programs. This makes it easy to implement and scale up to very large problem instances. Experiments using the TDT2 and Reuters-21578 corpus demonstrate that the proposed anchor-free approach exhibits very favorable performance (measured using coherence, similarity count, and clustering accuracy metrics) compared to the prior art.
Multilinear Low-Rank Tensors on Graphs & Applications
Shahid, Nauman, Grassi, Francesco, Vandergheynst, Pierre
W e propose a new framework for the analysis of low-rank tensors which lies at the intersection of spectral graph theory and signal processing. As a first step, we present a new graph based low-rank decomposition which approximates the classical low-rank SVD for matrices and multi-linear SVD for tensors. Then, building on this novel decomposition we construct a general class of convex optimization problems for approximately solving low-rank tensor inverse problems, such as tensor Robust PCA. The whole framework is named as "Multilinear Low-rank tensors on Graphs (MLRTG)". Our theoretical analysis shows: 1) MLRTG stands on the notion of approximate stationarity of multidimensional signals on graphs and 2) the approximation error depends on the eigen gaps of the graphs. W e demonstrate applications for a wide variety of 4 artificial and 12 real tensor datasets, such as EEG, FMRI, BCI, surveillance videos and hyperspectral images. Generalization of the tensor concepts to non-euclidean domain, orders of magnitude speedup, low-memory requirement and significantly enhanced performance at low SNR are the key aspects of our framework.
On numerical approximation schemes for expectation propagation
Several numerical approximation strategies for the expectation-propagation algorithm are studied in the context of large-scale learning: the Laplace method, a faster variant of it, Gaussian quadrature, and a deterministic version of variational sampling (i.e., combining quadrature with variational approximation). Experiments in training linear binary classifiers show that the expectation-propagation algorithm converges best using variational sampling, while it also converges well using Laplace-style methods with smooth factors but tends to be unstable with non-differentiable ones. Gaussian quadrature yields unstable behavior or convergence to a sub-optimal solution in most experiments.
Multi-Information Source Optimization
Poloczek, Matthias, Wang, Jialei, Frazier, Peter I.
We consider Bayesian optimization of an expensive-to-evaluate black-box objective function, where we also have access to cheaper approximations of the objective. In general, such approximations arise in applications such as reinforcement learning, engineering, and the natural sciences, and are subject to an inherent, unknown bias. This model discrepancy is caused by an inadequate internal model that deviates from reality and can vary over the domain, making the utilization of these approximations a non-trivial task. We present a novel algorithm that provides a rigorous mathematical treatment of the uncertainties arising from model discrepancies and noisy observations. Its optimization decisions rely on a value of information analysis that extends the Knowledge Gradient factor to the setting of multiple information sources that vary in cost: each sampling decision maximizes the predicted benefit per unit cost. We conduct an experimental evaluation that demonstrates that the method consistently outperforms other state-of-the-art techniques: it finds designs of considerably higher objective value and additionally inflicts less cost in the exploration process.
Thursday News: Data Science, Python, R, Watson, NLP, Elections
Here is our new selection of featured articles and resources. Starred articles have interesting charts. The picture below is from the last article. Topics cover a salary survey, several programming languages with comparisons, a mathematical optimization technique widely used in machine learning algorithms (this article has great animated gifs that illustrate the convergence of various methods) and a methodology to make correct election forecasts.
Learning an Astronomical Catalog of the Visible Universe through Scalable Bayesian Inference
Regier, Jeffrey, Pamnany, Kiran, Giordano, Ryan, Thomas, Rollin, Schlegel, David, McAuliffe, Jon, Prabhat, null
Celeste is a procedure for inferring astronomical catalogs that attains state-of-the-art scientific results. To date, Celeste has been scaled to at most hundreds of megabytes of astronomical images: Bayesian posterior inference is notoriously demanding computationally. In this paper, we report on a scalable, parallel version of Celeste, suitable for learning catalogs from modern large-scale astronomical datasets. Our algorithmic innovations include a fast numerical optimization routine for Bayesian posterior inference and a statistically efficient scheme for decomposing astronomical optimization problems into subproblems. Our scalable implementation is written entirely in Julia, a new high-level dynamic programming language designed for scientific and numerical computing. We use Julia's high-level constructs for shared and distributed memory parallelism, and demonstrate effective load balancing and efficient scaling on up to 8192 Xeon cores on the NERSC Cori supercomputer.
Recursive Decomposition for Nonconvex Optimization
Friesen, Abram L., Domingos, Pedro
Continuous optimization is an important problem in many areas of AI, including vision, robotics, probabilistic inference, and machine learning. Unfortunately, most real-world optimization problems are nonconvex, causing standard convex techniques to find only local optima, even with extensions like random restarts and simulated annealing. We observe that, in many cases, the local modes of the objective function have combinatorial structure, and thus ideas from combinatorial optimization can be brought to bear. Based on this, we propose a problem-decomposition approach to nonconvex optimization. Similarly to DPLL-style SAT solvers and recursive conditioning in probabilistic inference, our algorithm, RDIS, recursively sets variables so as to simplify and decompose the objective function into approximately independent sub-functions, until the remaining functions are simple enough to be optimized by standard techniques like gradient descent. The variables to set are chosen by graph partitioning, ensuring decomposition whenever possible. We show analytically that RDIS can solve a broad class of nonconvex optimization problems exponentially faster than gradient descent with random restarts. Experimentally, RDIS outperforms standard techniques on problems like structure from motion and protein folding.
EM Algorithm and Stochastic Control in Economics
Kou, Steven, Peng, Xianhua, Xu, Xingbo
Generalising the idea of the classical EM algorithm that is widely used for computing maximum likelihood estimates, we propose an EM-Control (EM-C) algorithm for solving multi-period finite time horizon stochastic control problems. The new algorithm sequentially updates the control policies in each time period using Monte Carlo simulation in a forward-backward manner; in other words, the algorithm goes forward in simulation and backward in optimization in each iteration. Similar to the EM algorithm, the EM-C algorithm has the monotonicity of performance improvement in each iteration, leading to good convergence properties. We demonstrate the effectiveness of the algorithm by solving stochastic control problems in the monopoly pricing of perishable assets and in the study of real business cycle.