Goto

Collaborating Authors

 Statistical Learning



Matrix Completion has No Spurious Local Minimum

Neural Information Processing Systems

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems. Simple non-convex optimization algorithms are popular and effective in practice. Despite recent progress in proving various non-convex algorithms converge from a good initial point, it remains unclear why random or arbitrary initialization suffices in practice. We prove that the commonly used non-convex objective function for positive semidefinite matrix completion has no spurious local minima - all local minima must also be global. Therefore, many popular optimization algorithms such as (stochastic) gradient descent can provably solve positive semidefinite matrix completion with arbitrary initialization in polynomial time. The result can be generalized to the setting when the observed entries contain noise. We believe that our main proof strategy can be useful for understanding geometric properties of other statistical problems involving partial or noisy observations.


Online and Differentially-Private Tensor Decomposition

Neural Information Processing Systems

Tensor decomposition is an important tool for big data analysis. In this paper, we resolve many of the key algorithmic questions regarding robustness, memory efficiency, and differential privacy of tensor decomposition. We propose simple variants of the tensor power method which enjoy these strong properties. We present the first guarantees for online tensor power method which has a linear memory requirement. Moreover, we present a noise calibrated tensor power method with efficient privacy guarantees. At the heart of all these guarantees lies a careful perturbation analysis derived in this paper which improves up on the existing results significantly.


Dual Decomposed Learning with Factorwise Oracle for Structural SVM of Large Output Domain

Neural Information Processing Systems

Many applications of machine learning involve structured outputs with large domains, where learning of a structured predictor is prohibitive due to repetitive calls to an expensive inference oracle. In this work, we show that by decomposing training of a Structural Support Vector Machine (SVM) into a series of multiclass SVM problems connected through messages, one can replace an expensive structured oracle with Factorwise Maximization Oracles (FMOs) that allow efficient implementation of complexity sublinear to the factor domain. AGreedy Direction Method of Multiplier (GDMM) algorithm is then proposed to exploit the sparsity of messages while guarantees convergence tosub-optimality after O(log(1/)) passes of FMOs over every factor. We conduct experiments on chain-structured and fully-connected problems of large output domains, where the proposed approach is orders-of-magnitude faster than current state-of-the-art algorithms for training Structural SVMs.


Automated scalable segmentation of neurons from multispectral images

Neural Information Processing Systems

Reconstruction of neuroanatomy is a fundamental problem in neuroscience. Stochastic expression of colors in individual cells is a promising tool, although its use in the nervous system has been limited due to various sources of variability in expression. Moreover, the intermingled anatomy of neuronal trees is challenging for existing segmentation algorithms. Here, we propose a method to automate the segmentation of neurons in such (potentially pseudo-colored) images. The method uses spatio-color relations between the voxels, generates supervoxels to reduce the problem size by four orders of magnitude before the final segmentation, and is parallelizable over the supervoxels. To quantify performance and gain insight, we generate simulated images, where the noise level and characteristics, the density of expression, and the number of fluorophore types are variable. We also present segmentations of real Brainbow images of the mouse hippocampus, which reveal many of the dendritic segments.




Improved Dropout for Shallow and Deep Learning

Neural Information Processing Systems

Dropout has been witnessed with great success in training deep neural networks by independently zeroing out the outputs of neurons at random. It has also received a surge of interest for shallow learning, e.g., logistic regression. However, the independent sampling for dropout could be suboptimal for the sake of convergence. In this paper, we propose to use multinomial sampling for dropout, i.e., sampling features or neurons according to a multinomial distribution with different probabilities for different features/neurons. To exhibit the optimal dropout probabilities, we analyze the shallow learning with multinomial dropout and establish the risk bound for stochastic optimization. By minimizing a sampling dependent factor in the risk bound, we obtain a distribution-dependent dropout with sampling probabilities dependent on the second order statistics of the data distribution. To tackle the issue of evolving distribution of neurons in deep learning, we propose an efficient adaptive dropout (named evolutional dropout) that computes the sampling probabilities on-the-fly from a mini-batch of examples. Empirical studies on several benchmark datasets demonstrate that the proposed dropouts achieve not only much faster convergence and but also a smaller testing error than the standard dropout. For example, on the CIFAR-100 data, the evolutional dropout achieves relative improvements over 10% on the prediction performance and over 50% on the convergence speed compared to the standard dropout.