Goto

Collaborating Authors

 Optimization


Large-Scale Optimization Algorithms for Sparse Conditional Gaussian Graphical Models

arXiv.org Machine Learning

This paper addresses the problem of scalable optimization for L1-regularized conditional Gaussian graphical models. Conditional Gaussian graphical models generalize the well-known Gaussian graphical models to conditional distributions to model the output network influenced by conditioning input variables. While highly scalable optimization methods exist for sparse Gaussian graphical model estimation, state-of-the-art methods for conditional Gaussian graphical models are not efficient enough and more importantly, fail due to memory constraints for very large problems. In this paper, we propose a new optimization procedure based on a Newton method that efficiently iterates over two sub-problems, leading to drastic improvement in computation time compared to the previous methods. We then extend our method to scale to large problems under memory constraints, using block coordinate descent to limit memory usage while achieving fast convergence. Using synthetic and genomic data, we show that our methods can solve one million dimensional problems to high accuracy in a little over a day on a single machine.


Linear Response Methods for Accurate Covariance Estimates from Mean Field Variational Bayes

arXiv.org Machine Learning

Mean field variational Bayes (MFVB) is a popular posterior approximation method due to its fast runtime on large-scale data sets. However, it is well known that a major failing of MFVB is that it underestimates the uncertainty of model variables (sometimes severely) and provides no information about model variable covariance. We generalize linear response methods from statistical physics to deliver accurate uncertainty estimates for model variables---both for individual variables and coherently across variables. We call our method linear response variational Bayes (LRVB). When the MFVB posterior approximation is in the exponential family, LRVB has a simple, analytic form, even for non-conjugate models. Indeed, we make no assumptions about the form of the true posterior. We demonstrate the accuracy and scalability of our method on a range of models for both simulated and real data.


Diffusion Methods for Classification with Pairwise Relationships

arXiv.org Artificial Intelligence

We define two algorithms for propagating information in classification problems with pairwise relationships. The algorithms are based on contraction maps and are related to non-linear diffusion and random walks on graphs. The approach is also related to message passing algorithms, including belief propagation and mean field methods. The algorithms we describe are guaranteed to converge on graphs with arbitrary topology. Moreover they always converge to a unique fixed point, independent of initialization. We prove that the fixed points of the algorithms under consideration define lower-bounds on the energy function and the max-marginals of a Markov random field. The theoretical results also illustrate a relationship between message passing algorithms and value iteration for an infinite horizon Markov decision process. We illustrate the practical application of the algorithms under study with numerical experiments in image restoration, stereo depth estimation and binary classification on a grid.


Noncrossing Ordinal Classification

arXiv.org Machine Learning

Ordinal data are often seen in real applications. Regular multicategory classification methods are not designed for this data type and a more proper treatment is needed. We consider a framework of ordinal classification which pools the results from binary classifiers together. An inherent difficulty of this framework is that the class prediction can be ambiguous due to boundary crossing. To fix this issue, we propose a noncrossing ordinal classification method which materializes the framework by imposing noncrossing constraints. An asymptotic study of the proposed method is conducted. We show by simulated and data examples that the proposed method can improve the classification performance for ordinal data without the ambiguity caused by boundary crossings.


Facility Deployment Decisions through Warp Optimizaton of Regressed Gaussian Processes

arXiv.org Machine Learning

University of South Carolina, Department of Mechanical Engineering, Nuclear Engineering Program, Columbia, SC 29201 Send proofs to: Anthony M. Scopatz scopatz@cec.sc.edu 541 Main Street, Columbia, SC 29208 Number of Pages: 35 Number of Tables: 0 Number of Figures: 11 Keywords: nuclear fuel cycle, gaussian process, dynamic time warping Abstract A method for quickly determining deployment schedules that meet a given fuel cycle demand is presented here. This algorithm is fast enough to perform in situ within low-fidelity fuel cycle simulators. It uses Gaussian process regression models to predict the production curve as a function of time and the number of deployed facilities. Each of these predictions is measured against the demand curve using the dynamic time warping distance. The minimum distance deployment schedule is evaluated in a full fuel cycle simulation, whose generated production curve then informs the model on the next optimization iteration. The method converges within five to ten iterations to a distance that is less than one percent of the total deployable production. A representative once-through fuel cycle is used to demonstrate the methodology for reactor deployment. I INTRODUCTION With the recent advent of agent-based nuclear fuel cycle simulators, such as Cyclus [1, 2], there comes the possibility to make in situ, dynamic facility deployment decisions. This would more fully model real-world fuel cycles where institutions (such as utility companies) predict future demand and choose their future deployment schedules appropriately. However, one of the major challenges to making in situ deployment decisions is the speed at which "good enough" decisions can be made. This paper proposes three related deployment-specific optimization algorithms that can be used for any demand curve and facility type. The demands of a fuel cycle scenario can often be simply stated, e.g. Here, the dynamic time warping (DTW) [3] distance is minimized between the demand curve and the regression of a Gaussian Process model (GP) [4] of prior simulations. This minimization produces a guess for a deployment schedule which is subsequently tested using an actual simulator. This process is repeated until an optimal deployment schedule for the given demand is found. Importantly, by using the Gaussian process surrogates, the number of simulation realizations that must be executed as part of the optimization may be reduced to only a handful. Furthermore, it is at least two orders-of-magnitude faster to test the model than it is to run a single low-fidelity fuel cycle simulation. Because of the relative computational cheapness, it is suitable to be used inside of a fuel cycle simulation.


Relative Density and Exact Recovery in Heterogeneous Stochastic Block Models

arXiv.org Machine Learning

The Stochastic Block Model (SBM) is a widely used random graph model for networks with communities. Despite the recent burst of interest in recovering communities in the SBM from statistical and computational points of view, there are still gaps in understanding the fundamental information theoretic and computational limits of recovery. In this paper, we consider the SBM in its full generality, where there is no restriction on the number and sizes of communities or how they grow with the number of nodes, as well as on the connection probabilities inside or across communities. This generality allows us to move past the artifacts of homogenous SBM, and understand the right parameters (such as the relative densities of communities) that define the various recovery thresholds. We outline the implications of our generalizations via a set of illustrative examples. For instance, $\log n$ is considered to be the standard lower bound on the cluster size for exact recovery via convex methods, for homogenous SBM. We show that it is possible, in the right circumstances (when sizes are spread and the smaller the cluster, the denser), to recover very small clusters (up to $\sqrt{\log n}$ size), if there are just a few of them (at most polylogarithmic in $n$).


Relaxed Linearized Algorithms for Faster X-Ray CT Image Reconstruction

arXiv.org Machine Learning

Statistical image reconstruction (SIR) methods are studied extensively for X-ray computed tomography (CT) due to the potential of acquiring CT scans with reduced X-ray dose while maintaining image quality. However, the longer reconstruction time of SIR methods hinders their use in X-ray CT in practice. To accelerate statistical methods, many optimization techniques have been investigated. Over-relaxation is a common technique to speed up convergence of iterative algorithms. For instance, using a relaxation parameter that is close to two in alternating direction method of multipliers (ADMM) has been shown to speed up convergence significantly. This paper proposes a relaxed linearized augmented Lagrangian (AL) method that shows theoretical faster convergence rate with over-relaxation and applies the proposed relaxed linearized AL method to X-ray CT image reconstruction problems. Experimental results with both simulated and real CT scan data show that the proposed relaxed algorithm (with ordered-subsets [OS] acceleration) is about twice as fast as the existing unrelaxed fast algorithms, with negligible computation and memory overhead.


Active Sampler: Light-weight Accelerator for Complex Data Analytics at Scale

arXiv.org Machine Learning

Recent years have witnessed amazing outcomes from "Big Models" trained by "Big Data". Most popular algorithms for model training are iterative. Due to the surging volumes of data, we can usually afford to process only a fraction of the training data in each iteration. Typically, the data are either uniformly sampled or sequentially accessed. In this paper, we study how the data access pattern can affect model training. We propose an Active Sampler algorithm, where training data with more "learning value" to the model are sampled more frequently. The goal is to focus training effort on valuable instances near the classification boundaries, rather than evident cases, noisy data or outliers. We show the correctness and optimality of Active Sampler in theory, and then develop a light-weight vectorized implementation. Active Sampler is orthogonal to most approaches optimizing the efficiency of large-scale data analytics, and can be applied to most analytics models trained by stochastic gradient descent (SGD) algorithm. Extensive experimental evaluations demonstrate that Active Sampler can speed up the training procedure of SVM, feature selection and deep learning, for comparable training quality by 1.6-2.2x.


A Unified Approach to Error Bounds for Structured Convex Optimization Problems

arXiv.org Machine Learning

Error bounds, which refer to inequalities that bound the distance of vectors in a test set to a given set by a residual function, have proven to be extremely useful in analyzing the convergence rates of a host of iterative methods for solving optimization problems. In this paper, we present a new framework for establishing error bounds for a class of structured convex optimization problems, in which the objective function is the sum of a smooth convex function and a general closed proper convex function. Such a class encapsulates not only fairly general constrained minimization problems but also various regularized loss minimization formulations in machine learning, signal processing, and statistics. Using our framework, we show that a number of existing error bound results can be recovered in a unified and transparent manner. To further demonstrate the power of our framework, we apply it to a class of nuclear-norm regularized loss minimization problems and establish a new error bound for this class under a strict complementarity-type regularity condition. We then complement this result by constructing an example to show that the said error bound could fail to hold without the regularity condition. Consequently, we obtain a rather complete answer to a question raised by Tseng. We believe that our approach will find further applications in the study of error bounds for structured convex optimization problems.


Boosted Sparse Non-linear Distance Metric Learning

arXiv.org Machine Learning

This paper proposes a boosting-based solution addressing metric learning problems for high-dimensional data. Distance measures have been used as natural measures of (dis)similarity and served as the foundation of various learning methods. The efficiency of distance-based learning methods heavily depends on the chosen distance metric. With increasing dimensionality and complexity of data, however, traditional metric learning methods suffer from poor scalability and the limitation due to linearity as the true signals are usually embedded within a low-dimensional nonlinear subspace. In this paper, we propose a nonlinear sparse metric learning algorithm via boosting. We restructure a global optimization problem into a forward stage-wise learning of weak learners based on a rank-one decomposition of the weight matrix in the Mahalanobis distance metric. A gradient boosting algorithm is devised to obtain a sparse rank-one update of the weight matrix at each step. Nonlinear features are learned by a hierarchical expansion of interactions incorporated within the boosting algorithm. Meanwhile, an early stopping rule is imposed to control the overall complexity of the learned metric. As a result, our approach guarantees three desirable properties of the final metric: positive semi-definiteness, low rank and element-wise sparsity. Numerical experiments show that our learning model compares favorably with the state-of-the-art methods in the current literature of metric learning.