Collaborating Authors

Regularized Co-Clustering with Dual Supervision

Neural Information Processing Systems

By attempting to simultaneously partition both the rows (examples) and columns (features) of a data matrix, Co-clustering algorithms often demonstrate surpris- ingly impressive performance improvements over traditional one-sided (row) clustering techniques. A good clustering of features may be seen as a combinatorial transformation of the data matrix, effectively enforcing a form of regularization that may lead to a better clustering of examples (and vice-versa). In many applications, partial supervision in the form of a few row labels as well as column labels may be available to potentially assist co-clustering. In this paper, we develop two novel semi-supervised multi-class classification algorithms motivated respectively by spectral bipartite graph partitioning and matrix approximation (e.g., non-negative matrix factorization) formulations for co-clustering. These algorithms (i) support dual supervision in the form of labels for both examples and/or features, (ii) provide principled predictive capability on out-of-sample test data, and (iii) arise naturally from the classical Representer theorem applied to regularization problems posed on a collection of Reproducing Kernel Hilbert Spaces.

Overview of Matrix Factorisation Techniques using Python


Low-rank approximations of data matrices have become an important tool in Machine Learning in the field of bio-informatics, computer vision, text processing, recommender systems, and others. They allow for embedding high dimensional data in lower dimensional spaces which mitigate effects due to noise, uncover latent relations, or facilitate further processing. In general, MF is a process to find two factor matrices, P R, k m and Q R, k n, to describe a given m-by-n training matrix R in which some entries may be missing. MF can be found in many applications, but we only use collaborative filtering in recommender systems as examples. It is based on the assumption that the entries of R are the historical users' preferences for merchandises, and the task on hand is to predict unobserved user behavior (i.e., missing entries in R) to make a suitable recommendation.

Bayesian Regularization and Nonnegative Deconvolution for Time Delay Estimation

Neural Information Processing Systems

Bayesian Regularization and Nonnegative Deconvolution (BRAND) is proposed for estimating time delays of acoustic signals in reverberant environments. Sparsity of the nonnegative filter coefficients is enforced using an L -norm regularization.

Rectified Factor Networks

Neural Information Processing Systems

We propose rectified factor networks (RFNs) to efficiently construct very sparse, non-linear, high-dimensional representations of the input. RFN models identify rare and small events, have a low interference between code units, have a small reconstruction error, and explain the data covariance structure. RFN learning is a generalized alternating minimization algorithm derived from the posterior regularization method which enforces non-negative and normalized posterior means. We proof convergence and correctness of the RFN learning algorithm.On benchmarks, RFNs are compared to other unsupervised methods like autoencoders, RBMs, factor analysis, ICA, and PCA. In contrast to previous sparse coding methods, RFNs yield sparser codes, capture the data's covariance structure more precisely, and have a significantly smaller reconstruction error.

Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Neural Information Processing Systems

Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering.