Shen, Bin
Nystrom Approximation for Sparse Kernel Methods: Theoretical Analysis and Empirical Evaluation
Xu, Zenglin (University of Electronic Science and Technology of China) | Jin, Rong (Michigan State University) | Shen, Bin (Purdue University) | Zhu, Shenghuo (Alibaba Group)
While if kernels are not Kernel methods (Schรถlkopf and Smola 2002; Xu et al. 2009) low rank, Nystrรถm approximations can usually lead to suboptimal have received a lot of attention in recent studies of machine performances. To alleviate the strong assumption in learning. These methods project data into high-dimensional the seeking of the approximation bounds, we take a more or even infinite-dimensional spaces via kernel mapping general assumption that the design matrix K ensuring the restricted functions. Despite the strong generalization ability induced isometric property (Koltchinskii 2011). In particular, by kernel methods, they usually suffer from the high computation the new assumption obeys the restricted eigenvalue condition complexity of calculating the kernel matrix (also (Koltchinskii 2011; Bickel, Ritov, and Tsybakov 2009), called Gram matrix). Although low-rank decomposition which has been shown to be more general than several techniques(e.g., Cholesky Decomposition (Fine and Scheinberg other similar assumptions used in sparsity literature (Candes 2002; Bach and Jordan 2005)), and truncating methods(e.g., and Tao 2007; Donoho, Elad, and Temlyakov 2006; Kernel Tapering (Shen, Xu, and Allebach 2014; Zhang and Huang 2008). Based on the restricted eigenvalue Furrer, Genton, and Nychka 2006)) can accelerate the calculation condition, we have provided error bounds for kernel approximation of the kernel matrix, they still need to compute the and recovery rate in sparse kernel regression.
Learning to Hash on Structured Data
Wang, Qifan (Purdue University) | Si, Luo (Purdue University) | Shen, Bin (Purdue University)
Hashing techniques have been widely applied for large scale similarity search problems due to the computational and memory efficiency.However, most existing hashing methods assume data examples are independently and identically distributed.But there often exists various additional dependency/structure information between data examplesin many real world applications. Ignoring this structure information may limit theperformance of existing hashing algorithms.This paper explores the research problemof learning to Hash on Structured Data (HSD) and formulates anovel framework that considers additional structure information.In particular, the hashing function is learned in a unified learning framework by simultaneously ensuring the structural consistency and preserving the similarities between data examples.An iterative gradient descent algorithm is designed as the optimization procedure. Furthermore, we improve the effectiveness of hashing function through orthogonal transformation by minimizing the quantization error.Experimentalresults on two datasets clearly demonstrate the advantages ofthe proposed method over several state-of-the-art hashing methods.
SP-SVM: Large Margin Classifier for Data on Multiple Manifolds
Shen, Bin (Purdue University) | Liu, Bao-Di (China University of Petroleum, Qingdao) | Wang, Qifan (Purdue University) | Fang, Yi (Santa Clara University) | Allebach, Jan P. (Purdue University)
As one of the most important state-of-the-art classification techniques, Support Vector Machine (SVM) has been widely adopted in many real-world applications, such as object detection, face recognition, text categorization, etc., due to its competitive practical performance and elegant theoretical interpretation. However, it treats all samples independently, and ignores the fact that, in many real situations especially when data are in high dimensional space, samples typically lie on low dimensional manifolds of the feature space and thus a sample can be related to its neighbors by being represented as a linear combination of other samples on the same manifold. This linear representation, which is usually sparse, reflects the structure of underlying manifolds. It has been extensively explored in the recent literature and proven to be critical for the performance of classification. To benefit from both the underlying low dimensional manifold structure and the large margin classifier, this paper proposes a novel method called Sparsity Preserving Support Vector Machine(SP-SVM), which explicitly considers the sparse representation of samples while maximizing the margin between different classes. Consequently, SP-SVM inherits both the discriminative power of support vector machine and the merits of sparsity. A set of experiments on real-world benchmark data sets show that SP-SVM achieves significantly higher precision on recognition task than various competitive baselines including the traditional SVM, the sparse representation based method and the classical nearest neighbor classifier.
Robust Nonnegative Matrix Factorization via $L_1$ Norm Regularization
Shen, Bin, Si, Luo, Ji, Rongrong, Liu, Baodi
Nonnegative Matrix Factorization (NMF) is a widely used technique in many applications such as face recognition, motion segmentation, etc. It approximates the nonnegative data in an original high dimensional space with a linear representation in a low dimensional space by using the product of two nonnegative matrices. In many applications data are often partially corrupted with large additive noise. When the positions of noise are known, some existing variants of NMF can be applied by treating these corrupted entries as missing values. However, the positions are often unknown in many real world applications, which prevents the usage of traditional NMF or other existing variants of NMF. This paper proposes a Robust Nonnegative Matrix Factorization (RobustNMF) algorithm that explicitly models the partial corruption as large additive noise without requiring the information of positions of noise. In practice, large additive noise can be used to model outliers. In particular, the proposed method jointly approximates the clean data matrix with the product of two nonnegative matrices and estimates the positions and values of outliers/noise. An efficient iterative optimization algorithm with a solid theoretical justification has been proposed to learn the desired matrix factorization. Experimental results demonstrate the advantages of the proposed algorithm.
Non-Negative Matrix Factorization Clustering on Multiple Manifolds
Shen, Bin (Purdue University) | Si, Luo (Purdue University)
Nonnegative Matrix Factorization (NMF) is a widely used technique in many applications such as clustering. It approximates the nonnegative data in an original high dimensional space with a linear representation in a low dimensional space by using the product of two nonnegative matrices. In many applications with data such as human faces or digits, data often reside on multiple manifolds, which may overlap or intersect. But the traditional NMF method and other existing variants of NMF do not consider this. This paper proposes a novel clustering algorithm that explicitly models the intrinsic geometrical structure of the data on multiple manifolds with NMF. The idea of the proposed algorithm is that a data point generated by several neighboring points on a specific manifold in the original space should be constructed in a similar way in the low dimensional subspace. A set of experimental results on two real world datasets demonstrate the advantage of the proposed algorithm.