Non-negative matrix factorization (NMF) is a prob- lem with many applications, ranging from facial recognition to document clustering. However, due to the variety of algorithms that solve NMF, the randomness involved in these algorithms, and the somewhat subjective nature of the problem, there is no clear "correct answer" to any particular NMF problem, and as a result, it can be hard to test new algorithms. This paper suggests some test cases for NMF algorithms derived from matrices with enumerable exact non-negative factorizations and perturbations of these matrices. Three algorithms using widely divergent approaches to NMF all give similar solutions over these test cases, suggesting that these test cases could be used as test cases for implementations of these existing NMF algorithms as well as potentially new NMF algorithms. This paper also describes how the proposed test cases could be used in practice.
Identifying recurring patterns in high-dimensional time series data is an important problem in many scientific domains. A popular model to achieve this is convolutive nonnegative matrix factorization (CNMF), which extends classic nonnegative matrix factorization (NMF) to extract short-lived temporal motifs from a long time series. Prior work has typically fit this model by multiplicative parameter updates---an approach widely considered to be suboptimal for NMF, especially in large-scale data applications. Here, we describe how to extend two popular and computationally scalable NMF algorithms---Hierarchical Alternating Least Squares (HALS) and Alternatining Nonnegative Least Squares (ANLS)---for the CNMF model. Both methods demonstrate performance advantages over multiplicative updates on large-scale synthetic and real world data.
A robust algorithm for non-negative matrix factorization (NMF) is presented in this paper with the purpose of dealing with large-scale data, where the separability assumption is satisfied. In particular, we modify the Linear Programming (LP) algorithm of  by introducing a reduced set of constraints for exact NMF. In contrast to the previous approaches, the proposed algorithm does not require the knowledge of factorization rank (extreme rays  or topics ). Furthermore, motivated by a similar problem arising in the context of metabolic network analysis , we consider an entirely different regime where the number of extreme rays or topics can be much larger than the dimension of the data vectors. The performance of the algorithm for different synthetic data sets are provided.
Symmetric nonnegative matrix factorization (NMF)---a special but important class of the general NMF---is demonstrated to be useful for data analysis and in particular for various clustering tasks. Unfortunately, designing fast algorithms for Symmetric NMF is not as easy as for the nonsymmetric counterpart, the latter admitting the splitting property that allows efficient alternating-type algorithms. To overcome this issue, we transfer the symmetric NMF to a nonsymmetric one, then we can adopt the idea from the state-of-the-art algorithms for nonsymmetric NMF to design fast algorithms solving symmetric NMF. We rigorously establish that solving nonsymmetric reformulation returns a solution for symmetric NMF and then apply fast alternating based algorithms for the corresponding reformulated problem. Furthermore, we show these fast algorithms admit strong convergence guarantee in the sense that the generated sequence is convergent at least at a sublinear rate and it converges globally to a critical point of the symmetric NMF. We conduct experiments on both synthetic data and image clustering to support our result.
Symmetric Nonnegative Matrix Factorization (SNMF) models arise naturally as simple reformulations of many standard clustering algorithms including the popular spectral clustering method. Recent work has demonstrated that an elementary instance of SNMF provides superior clustering quality compared to many classic clustering algorithms on a variety of synthetic and real world data sets. In this work, we present novel reformulations of this instance of SNMF based on the notion of variable splitting and produce two fast and effective algorithms for its optimization using i) the provably convergent Accelerated Proximal Gradient (APG) procedure and ii) a heuristic version of the Alternating Direction Method of Multipliers (ADMM) framework. Our two algorithms present an interesting tradeoff between computational speed and mathematical convergence guarantee: while the former method is provably convergent it is considerably slower than the latter approach, for which we also provide significant but less stringent mathematical proof regarding its convergence. Through extensive experiments we show not only that the efficacy of these approaches is equal to that of the state of the art SNMF algorithm, but also that the latter of our algorithms is extremely fast being one to two orders of magnitude faster in terms of total computation time than the state of the art approach, outperforming even spectral clustering in terms of computation time on large data sets.