matrix factorization
Sparse Network Inference under Imperfect Detection and its Application to Ecological Networks
Zhang, Aoran, Wei, Tianyao, Guerrero, Maria J., Uribe, César A.
Abstract--Recovering latent structure from count data has received considerable attention in network inference, particularly when one seeks both cross-group interactions and within-group similarity patterns in bipartite networks, which is widely used in ecology research. Such networks are often sparse and inherently imperfect in their detection. Existing models mainly focus on interaction recovery, while the induced similarity graphs are much less studied. Moreover, sparsity is often not controlled, and scale is unbalanced, leading to oversparse or poorly rescaled estimates with degrading structural recovery. We impose nonconvex ℓ1/2 regularization on the latent similarity and connectivity structures to promote sparsity within-group similarity and cross-group connectivity with better relative scale. To solve it, we develop an ADMM-based algorithm with adaptive penalization and scale-aware initialization and establish its asymptotic feasibility and KKT stationarity of cluster points under mild regularity conditions. Experiments on synthetic and real-world ecological datasets demonstrate improved recovery of latent factors and similarity/connectivity structure relative to existing baselines. Index Terms--augmented Lagrangian, nonconvex nonsmooth optimization, nonnegative matrix factorization, link prediction, ecological network inference, structured sparse recovery I. INTRODUCTION This setting is inherent in sensing and monitoring applications [3], [4], where observations, such as counts, are obtained via an imperfect sampling process. In this paper, we are interested in ecological interaction networks describing how species associate with locations and how environments shape biodiversity patterns [5], [6].
- North America > United States (0.14)
- South America > Colombia > Santander Department (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data
Seraghiti, Giovanni, Dubrulle, Kévin, Vandaele, Arnaud, Gillis, Nicolas
Nonnegative matrix factorization (NMF) approximates a nonnegative matrix, $X$, by the product of two nonnegative factors, $WH$, where $W$ has $r$ columns and $H$ has $r$ rows. In this paper, we consider NMF using the component-wise L1 norm as the error measure (L1-NMF), which is suited for data corrupted by heavy-tailed noise, such as Laplace noise or salt and pepper noise, or in the presence of outliers. Our first contribution is an NP-hardness proof for L1-NMF, even when $r=1$, in contrast to the standard NMF that uses least squares. Our second contribution is to show that L1-NMF strongly enforces sparsity in the factors for sparse input matrices, thereby favoring interpretability. However, if the data is affected by false zeros, too sparse solutions might degrade the model. Our third contribution is a new, more general, L1-NMF model for sparse data, dubbed weighted L1-NMF (wL1-NMF), where the sparsity of the factorization is controlled by adding a penalization parameter to the entries of $WH$ associated with zeros in the data. The fourth contribution is a new coordinate descent (CD) approach for wL1-NMF, denoted as sparse CD (sCD), where each subproblem is solved by a weighted median algorithm. To the best of our knowledge, sCD is the first algorithm for L1-NMF whose complexity scales with the number of nonzero entries in the data, making it efficient in handling large-scale, sparse data. We perform extensive numerical experiments on synthetic and real-world data to show the effectiveness of our new proposed model (wL1-NMF) and algorithm (sCD).
- Europe > United Kingdom (0.04)
- Europe > Belgium (0.04)
- North America > United States > Utah (0.04)
- (5 more...)
Scalable Robust Matrix Factorization with Nonconvex Loss
Robust matrix factorization (RMF), which uses the $\ell_1$-loss, often outperforms standard matrix factorization using the $\ell_2$-loss, particularly when outliers are present. The state-of-the-art RMF solver is the RMF-MM algorithm, which, however, cannot utilize data sparsity. Moreover, sometimes even the (convex) $\ell_1$-loss is not robust enough. In this paper, we propose the use of nonconvex loss to enhance robustness. To address the resultant difficult optimization problem, we use majorization-minimization (MM) optimization and propose a new MM surrogate. To improve scalability, we exploit data sparsity and optimize the surrogate via its dual with the accelerated proximal gradient algorithm. The resultant algorithm has low time and space complexities and is guaranteed to converge to a critical point. Extensive experiments demonstrate its superiority over the state-of-the-art in terms of both accuracy and scalability.
- North America > United States (0.14)
- Asia > China > Jiangxi Province (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
- Information Technology (0.93)
- Education > Educational Setting > K-12 Education (0.67)
- Education > Educational Setting > Online (0.46)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > California > Riverside County > Riverside (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Europe > United Kingdom > England > Greater London > London (0.04)
- North America > United States > New York > Richmond County > New York City (0.04)
- Europe > United Kingdom > Wales > Cardiff (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Overview (0.93)
- Health & Medicine > Therapeutic Area (0.94)
- Health & Medicine > Diagnostic Medicine (0.68)
- Information Technology (0.67)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
- Health & Medicine (0.46)
- Government (0.46)
- North America > United States > California (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada (0.04)
- (2 more...)
- North America > United States > Illinois > Champaign County > Urbana (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Colorado > Jefferson County > Golden (0.15)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > Canada > Quebec > Montreal (0.04)