Goto

Collaborating Authors

 Optimization


Distributed Estimation, Information Loss and Exponential Families

arXiv.org Machine Learning

Distributed learning of probabilistic models from multiple data repositories with minimum communication is increasingly important. We study a simple communication-efficient learning framework that first calculates the local maximum likelihood estimates (MLE) based on the data subsets, and then combines the local MLEs to achieve the best possible approximation to the global MLE given the whole dataset. We study this framework's statistical properties, showing that the efficiency loss compared to the global setting relates to how much the underlying distribution families deviate from full exponential families, drawing connection to the theory of information loss by Fisher, Rao and Efron. We show that the "full-exponential-family-ness" represents the lower bound of the error rate of arbitrary combinations of local MLEs, and is achieved by a KL-divergence-based combination method but not by a more common linear combination method. We also study the empirical properties of both methods, showing that the KL method significantly outperforms linear combination in practical settings with issues such as model misspecification, non-convexity, and heterogeneous data partitions.


Methods and Models for Interpretable Linear Classification

arXiv.org Machine Learning

We present an integer programming framework to build accurate and interpretable discrete linear classification models. Unlike existing approaches, our framework is designed to provide practitioners with the control and flexibility they need to tailor accurate and interpretable models for a domain of choice. To this end, our framework can produce models that are fully optimized for accuracy, by minimizing the 0--1 classification loss, and that address multiple aspects of interpretability, by incorporating a range of discrete constraints and penalty functions. We use our framework to produce models that are difficult to create with existing methods, such as scoring systems and M-of-N rule tables. In addition, we propose specially designed optimization methods to improve the scalability of our framework through decomposition and data reduction. We show that discrete linear classifiers can attain the training accuracy of any other linear classifier, and provide an Occam's Razor type argument as to why the use of small discrete coefficients can provide better generalization. We demonstrate the performance and flexibility of our framework through numerical experiments and a case study in which we construct a highly tailored clinical tool for sleep apnea diagnosis.


Energy and Uncertainty: Models and Algorithms for Complex Energy Systems

AI Magazine

The problem of controlling energy systems (generation, transmission, storage, investment) introduces a number of optimization problems which need to be solved in the presence of different types of uncertainty. We highlight several of these applications, using a simple energy storage problem as a case application. Using this setting, we describe a modeling framework based around five fundamental dimensions which is more natural than the standard canonical form widely used in the reinforcement learning community. The framework focuses on finding the best policy, where we identify four fundamental classes of policies consisting of policy function approximations (PFAs), cost function approximations (CFAs), policies based on value function approximations (VFAs), and lookahead policies.


Energy and Uncertainty: Models and Algorithms for Complex Energy Systems

AI Magazine

The problem of controlling energy systems (generation, transmission, storage, investment) introduces a number of optimization problems which need to be solved in the presence of different types of uncertainty. We highlight several of these applications, using a simple energy storage problem as a case application. Using this setting, we describe a modeling framework based around five fundamental dimensions which is more natural than the standard canonical form widely used in the reinforcement learning community. The framework focuses on finding the best policy, where we identify four fundamental classes of policies consisting of policy function approximations (PFAs), cost function approximations (CFAs), policies based on value function approximations (VFAs), and lookahead policies. This organization unifies a number of competing strategies under a common umbrella.


Communication-Efficient Distributed Dual Coordinate Ascent

arXiv.org Machine Learning

Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning. In this paper, we propose a communication-efficient framework, CoCoA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. We provide a strong convergence rate analysis for this class of algorithms, as well as experiments on real-world distributed datasets with implementations in Spark. In our experiments, we find that as compared to state-of-the-art mini-batch versions of SGD and SDCA algorithms, CoCoA converges to the same .001-accurate solution quality on average 25x as quickly.


Semidefinite Programming Based Preconditioning for More Robust Near-Separable Nonnegative Matrix Factorization

arXiv.org Machine Learning

Nonnegative matrix factorization (NMF) under the separability assumption can provably be solved efficiently, even in the presence of noise, and has been shown to be a powerful technique in document classification and hyperspectral unmixing. This problem is referred to as near-separable NMF and requires that there exists a cone spanned by a small subset of the columns of the input nonnegative matrix approximately containing all columns. In this paper, we propose a preconditioning based on semidefinite programming making the input matrix well-conditioned. This in turn can improve significantly the performance of near-separable NMF algorithms which is illustrated on the popular successive projection algorithm (SPA). The new preconditioned SPA is provably more robust to noise, and outperforms SPA on several synthetic data sets. We also show how an active-set method allow us to apply the preconditioning on large-scale real-world hyperspectral images.


Parallel Distributed Block Coordinate Descent Methods based on Pairwise Comparison Oracle

arXiv.org Machine Learning

This paper provides a block coordinate descent algorithm to solve unconstrained optimization problems. In our algorithm, computation of function values or gradients is not required. Instead, pairwise comparison of function values is used. Our algorithm consists of two steps; one is the direction estimate step and the other is the search step. Both steps require only pairwise comparison of function values, which tells us only the order of function values over two points. In the direction estimate step, a Newton type search direction is estimated. A computation method like block coordinate descent methods is used with the pairwise comparison. In the search step, a numerical solution is updated along the estimated direction. The computation in the direction estimate step can be easily parallelized, and thus, the algorithm works efficiently to find the minimizer of the objective function. Also, we show an upper bound of the convergence rate. In numerical experiments, we show that our method efficiently finds the optimal solution compared to some existing methods based on the pairwise comparison.


Simulating Non Stationary Operators in Search Algorithms

arXiv.org Artificial Intelligence

In this paper, we propose a model for simulating search operators whose behaviour often changes continuously during the search. In these scenarios, the performance of the operators decreases when they are applied. This is motivated by the fact that operators for optimization problems are often roughly classified into exploitation operators and exploration operators. Our simulation model is used to compare the different performances of operator selection policies and clearly identify their ability to adapt to such specific operators behaviours. The experimental study provides interesting results on the respective behaviours of operator selection policies when faced to such non stationary search scenarios. Keywords: Island Models, Adaptive Operator Selection 1. Introduction Selecting the most suitable operators in a search algorithm when solving optimization problems is an active research area (Eiben et al., 2007; Lobo et al., 2007). Given an optimization problem, a search algorithm mainly consists in applying basic solving operators -- heuristics -- in order to explore and exploit the search space for retrieving solutions.


Marginal Structured SVM with Hidden Variables

arXiv.org Machine Learning

In this work, we propose the marginal structured SVM (MSSVM) for structured prediction with hidden variables. MSSVM properly accounts for the uncertainty of hidden variables, and can significantly outperform the previously proposed latent structured SVM (LSSVM; Yu & Joachims (2009)) and other state-of-art methods, especially when that uncertainty is large. Our method also results in a smoother objective function, making gradient-based optimization of MSSVMs converge significantly faster than for LSSVMs. We also show that our method consistently outperforms hidden conditional random fields (HCRFs; Quattoni et al. (2007)) on both simulated and real-world datasets. Furthermore, we propose a unified framework that includes both our and several other existing methods as special cases, and provides insights into the comparison of different models in practice.


Structured Low-Rank Matrix Factorization with Missing and Grossly Corrupted Observations

arXiv.org Machine Learning

Recovering low-rank and sparse matrices from incomplete or corrupted observations is an important problem in machine learning, statistics, bioinformatics, computer vision, as well as signal and image processing. In theory, this problem can be solved by the natural convex joint/mixed relaxations (i.e., l_{1}-norm and trace norm) under certain conditions. However, all current provable algorithms suffer from superlinear per-iteration cost, which severely limits their applicability to large-scale problems. In this paper, we propose a scalable, provable structured low-rank matrix factorization method to recover low-rank and sparse matrices from missing and grossly corrupted data, i.e., robust matrix completion (RMC) problems, or incomplete and grossly corrupted measurements, i.e., compressive principal component pursuit (CPCP) problems. Specifically, we first present two small-scale matrix trace norm regularized bilinear structured factorization models for RMC and CPCP problems, in which repetitively calculating SVD of a large-scale matrix is replaced by updating two much smaller factor matrices. Then, we apply the alternating direction method of multipliers (ADMM) to efficiently solve the RMC problems. Finally, we provide the convergence analysis of our algorithm, and extend it to address general CPCP problems. Experimental results verified both the efficiency and effectiveness of our method compared with the state-of-the-art methods.