Sun, Will Wei, Lu, Junwei, Liu, Han, Cheng, Guang

We propose a novel sparse tensor decomposition method, namely Tensor Truncated Power (TTP) method, that incorporates variable selection into the estimation of decomposition components. The sparsity is achieved via an efficient truncation step embedded in the tensor power iteration. Our method applies to a broad family of high dimensional latent variable models, including high dimensional Gaussian mixture and mixtures of sparse regressions. A thorough theoretical investigation is further conducted. In particular, we show that the final decomposition estimator is guaranteed to achieve a local statistical rate, and further strengthen it to the global statistical rate by introducing a proper initialization procedure. In high dimensional regimes, the obtained statistical rate significantly improves those shown in the existing non-sparse decomposition methods. The empirical advantages of TTP are confirmed in extensive simulated results and two real applications of click-through rate prediction and high-dimensional gene clustering.

Calderhead, Ben, Sustik, Mátyás A.

One of the enduring challenges in Markov chain Monte Carlo methodology is the development of proposal mechanisms to make moves distant from the current point, that are accepted with high probability and at low computational cost. The recent introduction of locally adaptive MCMC methods based on the natural underlying Riemannian geometry of such models goes some way to alleviating these problems for certain classes of models for which the metric tensor is analytically tractable, however computational efficiency is not assured due to the necessity of potentially high-dimensional matrix operations at each iteration. In this paper we firstly investigate a sampling-based approach for approximating the metric tensor and suggest a valid MCMC algorithm that extends the applicability of Riemannian Manifold MCMC methods to statistical models that do not admit an analytically computable metric tensor. Secondly, we show how the approximation scheme we consider naturally motivates the use of l1 regularisation to improve estimates and obtain a sparse approximate inverse of the metric, which enables stable and sparse approximations of the local geometry to be made. We demonstrate the application of this algorithm for inferring the parameters of a realistic system of ordinary differential equations using a biologically motivated robust student-t error model, for which the expected Fisher Information is analytically intractable.

Xu, Zenglin (University of Electronic Science and Technology of China) | Jin, Rong (Michigan State University) | Shen, Bin (Purdue University) | Zhu, Shenghuo (Alibaba Group)

Nystrom approximation is an effective approach to accelerate the computation of kernel matrices in many kernel methods. In this paper, we consider the Nystrom approximation for sparse kernel methods. Instead of relying on the low-rank assumption of the original kernels, which sometimes does not hold in some applications, we take advantage of the restricted eigenvalue condition, which has been proved to be robust for sparse kernel methods. Based on the restricted eigenvalue condition, we have provided not only the approximation bound for the original kernel matrix but also the recovery bound for the sparse solutions of sparse kernel regression. In addition to the theoretical analysis, we also demonstrate the good performance of the Nystrom approximation for sparse kernel regression on real world data sets.

Bien, Jacob, Xu, Ya, Mahoney, Michael W.

The CUR decomposition provides an approximation of a matrix X that has low reconstruction error and that is sparse in the sense that the resulting approximation lies in the span of only a few columns of X. In this regard, it appears to be similar to many sparse PCA methods. However, CUR takes a randomized algorithmic approach whereas most sparse PCA methods are framed as convex optimization problems. In this paper, we try to understand CUR from a sparse optimization viewpoint. In particular, we show that CUR is implicitly optimizing a sparse regression objective and, furthermore, cannot be directly cast as a sparse PCA method.

Anandkumar, Animashree, Jain, Prateek, Shi, Yang, Niranjan, U. N.

Robust tensor CP decomposition involves decomposing a tensor into low rank and sparse components. We propose a novel non-convex iterative algorithm with guaranteed recovery. It alternates between low-rank CP decomposition through gradient ascent (a variant of the tensor power method), and hard thresholding of the residual. We prove convergence to the globally optimal solution under natural incoherence conditions on the low rank component, and bounded level of sparse perturbations. We compare our method with natural baselines which apply robust matrix PCA either to the {\em flattened} tensor, or to the matrix slices of the tensor. Our method can provably handle a far greater level of perturbation when the sparse tensor is block-structured. This naturally occurs in many applications such as the activity detection task in videos. Our experiments validate these findings. Thus, we establish that tensor methods can tolerate a higher level of gross corruptions compared to matrix methods.