Goto

Collaborating Authors

 Becker, Stephen


Modeling massive multivariate spatial data with the basis graphical lasso

arXiv.org Machine Learning

We propose a new modeling framework for highly multivariate spatial processes that synthesizes ideas from recent multiscale and spectral approaches with graphical models. The basis graphical lasso writes a univariate Gaussian process as a linear combination of basis functions weighted with entries of a Gaussian graphical vector whose graph is estimated from optimizing an $\ell_1$ penalized likelihood. This paper extends the setting to a multivariate Gaussian process where the basis functions are weighted with Gaussian graphical vectors. We motivate a model where the basis functions represent different levels of resolution and the graphical vectors for each level are assumed to be independent. Using an orthogonal basis grants linear complexity and memory usage in the number of spatial locations, the number of basis functions, and the number of realizations. An additional fusion penalty encourages a parsimonious conditional independence structure in the multilevel graphical model. We illustrate our method on a large climate ensemble from the National Center for Atmospheric Research's Community Atmosphere Model that involves 40 spatial processes.


Spectral estimation from simulations via sketching

arXiv.org Machine Learning

Sketching is a stochastic dimension reduction method that preserves geometric structures of data and has applications in high-dimensional regression, low rank approximation and graph sparsification. In this work, we show that sketching can be used to compress simulation data and still accurately estimate time autocorrelation and power spectral density. For a given compression ratio, the accuracy is much higher than using previously known methods. In addition to providing theoretical guarantees, we apply sketching to a molecular dynamics simulation of methanol and find that the estimate of spectral density is 90% accurate using only 10% of the data.


One-Pass Sparsified Gaussian Mixtures

arXiv.org Machine Learning

We present a one-pass sparsified Gaussian mixture model (SGMM). Given $P$-dimensional datapoints $X = \{\mathbf{x}_i\}_{i=1}^N$, the model fits $K$ Gaussian distributions to $X$ and (softly) classifies each xi to these clusters. After paying an up-front cost of $\mathcal{O}(NP\log P)$ to precondition the data, we subsample $Q$ entries of each datapoint and discard the full $P$-dimensional data. SGMM operates in $\mathcal{O}(KNQ)$ time per iteration for diagonal or spherical covariances, independent of $P$, while estimating the model parameters $\theta$ in the full $P$-dimensional space, making it one-pass and hence suitable for streaming data. We derive the maximum likelihood estimators for $\theta$ in the sparsified regime, demonstrate clustering on synthetic and real data, and show that SGMM is faster than GMM while preserving accuracy.


Perturbed Proximal Descent to Escape Saddle Points for Non-convex and Non-smooth Objective Functions

arXiv.org Machine Learning

We consider the problem of finding local minimizers in nonconvex andnon-smooth optimization. Under the assumption of strict saddle points, positive results have been derived for first-order methods. We present the first known results for the non-smooth case, which requires differentanalysis and a different algorithm. This is the extended version of the paper that contains the proofs.


Low-Rank Tucker Decomposition of Large Tensors Using TensorSketch

Neural Information Processing Systems

We propose two randomized algorithms for low-rank Tucker decomposition of tensors. The algorithms, which incorporate sketching, only require a single pass of the input tensor and can handle tensors whose elements are streamed in any order. To the best of our knowledge, ours are the only algorithms which can do this. We test our algorithms on sparse synthetic data and compare them to multiple other methods. We also apply one of our algorithms to a real dense 38 GB tensor representing a video and use the resulting decomposition to correctly classify frames containing disturbances.


Low-Rank Tucker Decomposition of Large Tensors Using TensorSketch

Neural Information Processing Systems

We propose two randomized algorithms for low-rank Tucker decomposition of tensors. The algorithms, which incorporate sketching, only require a single pass of the input tensor and can handle tensors whose elements are streamed in any order. To the best of our knowledge, ours are the only algorithms which can do this. We test our algorithms on sparse synthetic data and compare them to multiple other methods. We also apply one of our algorithms to a real dense 38 GB tensor representing a video and use the resulting decomposition to correctly classify frames containing disturbances.


Randomized Clustered Nystrom for Large-Scale Kernel Machines

AAAI Conferences

The Nystrom method is a popular technique for generating low-rank approximations of kernel matrices that arise in many machine learning problems. The approximation quality of the Nystrom method depends crucially on the number of selected landmark points and the selection procedure. In this paper, we introduce a randomized algorithm for generating landmark points that is scalable to large high-dimensional data sets. The proposed method performs K-means clustering on low-dimensional random projections of a data set and thus leads to significant savings for high-dimensional data sets. Our theoretical results characterize the tradeoffs between accuracy and efficiency of the proposed method. Moreover, numerical experiments on classification and regression tasks demonstrate the superior performance and efficiency of our proposed method compared with existing approaches.


Improved Fixed-Rank Nystr\"om Approximation via QR Decomposition: Practical and Theoretical Aspects

arXiv.org Machine Learning

The Nystr\"om method is a popular technique for computing fixed-rank approximations of large kernel matrices using a small number of landmark points. In practice, to ensure high quality approximations, the number of landmark points is chosen to be greater than the target rank. However, the standard Nystr\"om method uses a sub-optimal procedure for rank reduction mainly due to its simplicity. In this paper, we highlight the drawbacks of standard Nystr\"om in terms of poor performance and lack of theoretical guarantees. To address these issues, we present an efficient method for generating improved fixed-rank Nystr\"om approximations. Theoretical analysis and numerical experiments are provided to demonstrate the advantages of the modified method over the standard Nystr\"om method. Overall, the aim of this paper is to convince researchers to use the modified method, as it has nearly identical computational complexity, is easy to code, and has greatly improved accuracy in many cases.


Robust Partially-Compressed Least-Squares

AAAI Conferences

Randomized matrix compression techniques, such as the Johnson-Lindenstrauss transform, have emerged as an effective and practical way for solving large-scale problems efficiently. With a focus on computational efficiency, however, forsaking solutions quality and accuracy becomes the trade-off. In this paper, we investigate compressed least-squares problems and propose new models and algorithms that address the issue of error and noise introduced by compression. While maintaining computational efficiency, our models provide robust solutions that are more accurate than those of classical compressed variants. We introduce tools from robust optimization together with a form of partial compression to improve the error-time trade-offs of compressed least-squares solvers. We develop an efficient solution algorithm for our Robust Partially-Compressed (RPC) model based on a reduction to a one-dimensional search.


Randomized Clustered Nystrom for Large-Scale Kernel Machines

arXiv.org Machine Learning

The Nystrom method has been popular for generating the low-rank approximation of kernel matrices that arise in many machine learning problems. The approximation quality of the Nystrom method depends crucially on the number of selected landmark points and the selection procedure. In this paper, we present a novel algorithm to compute the optimal Nystrom low-approximation when the number of landmark points exceed the target rank. Moreover, we introduce a randomized algorithm for generating landmark points that is scalable to large-scale data sets. The proposed method performs K-means clustering on low-dimensional random projections of a data set and, thus, leads to significant savings for high-dimensional data sets. Our theoretical results characterize the tradeoffs between the accuracy and efficiency of our proposed method. Extensive experiments demonstrate the competitive performance as well as the efficiency of our proposed method.