Ma, Zongming
Sparse GCA and Thresholded Gradient Descent
Gao, Sheng, Ma, Zongming
With the advent of big data acquisition technology, it has become increasingly important to integrate information across multiple datasets collected on a common set of subjects. Canonical correlation analysis (CCA), first proposed by Hotelling [20], is a widely used statistical tool to integrate information from two datasets: It seeks linear combinations of variables within each dataset such that their correlation is maximized. However, recent advances in fields such as multi-omics and multimodal brain imaging have presented us with new challenges, since scientists are often able to collect more than two datasets on the same set of subjects nowadays. To tackle these challenges, we turn to a useful generalization of CCA called generalized correlation analysis (GCA) [23] which aims to explore linear relationships across multiple data sources. Kettenring [23] proposed five different techniques for generalized correlation analysis of multiple datasets, where different methods correspond to maximization of different objective functions of covariances and correlations, subject to certain normalization constraints.
Community detection in sparse latent space models
Gao, Fengnan, Ma, Zongming, Yuan, Hongsong
We show that a simple community detection algorithm originated from stochastic blockmodel literature achieves consistency, and even optimality, for a broad and flexible class of sparse latent space models. The class of models includes latent eigenmodels (arXiv:0711.1146). The community detection algorithm is based on spectral clustering followed by local refinement via normalized edge counting.
Efficient random graph matching via degree profiles
Ding, Jian, Ma, Zongming, Wu, Yihong, Xu, Jiaming
Random graph matching refers to recovering the underlying vertex correspondence between two random graphs with correlated edges; a prominent example is when the two random graphs are given by Erd\H{o}s-R\'{e}nyi graphs $G(n,\frac{d}{n})$. This can be viewed as an average-case and noisy version of the graph isomorphism problem. Under this model, the maximum likelihood estimator is equivalent to solving the intractable quadratic assignment problem. This work develops an $\tilde{O}(n d^2+n^2)$-time algorithm which perfectly recovers the true vertex correspondence with high probability, provided that the average degree is at least $d = \Omega(\log^2 n)$ and the two graphs differ by at most $\delta = O( \log^{-2}(n) )$ fraction of edges. For dense graphs and sparse graphs, this can be improved to $\delta = O( \log^{-2/3}(n) )$ and $\delta = O( \log^{-2}(d) )$ respectively, both in polynomial time. The methodology is based on appropriately chosen distance statistics of the degree profiles (empirical distribution of the degrees of neighbors). Before this work, the best known result achieves $\delta=O(1)$ and $n^{o(1)} \leq d \leq n^c$ for some constant $c$ with an $n^{O(\log n)}$-time algorithm \cite{barak2018nearly} and $\delta=\tilde O((d/n)^4)$ and $d = \tilde{\Omega}(n^{4/5})$ with a polynomial-time algorithm \cite{dai2018performance}.
Minimax Rates in Network Analysis: Graphon Estimation, Community Detection and Hypothesis Testing
Gao, Chao, Ma, Zongming
This paper surveys some recent developments in fundamental limits and optimal algorithms for network analysis. We focus on minimax optimal rates in three fundamental problems of network analysis: graphon estimation, community detection, and hypothesis testing. For each problem, we review state-of-the-art results in the literature followed by general principles behind the optimal procedures that lead to minimax estimation and testing. This allows us to connect problems in network analysis to other statistical inference problems from a general perspective.
Exploration of Large Networks with Covariates via Fast and Universal Latent Space Model Fitting
Ma, Zhuang, Ma, Zongming
Latent space models are effective tools for statistical modeling and exploration of network data. These models can effectively model real world network characteristics such as degree heterogeneity, transitivity, homophily, etc. Due to their close connection to generalized linear models, it is also natural to incorporate covariate information in them. The current paper presents two universal fitting algorithms for networks with edge covariates: one based on nuclear norm penalization and the other based on projected gradient descent. Both algorithms are motivated by maximizing likelihood for a special class of inner-product models while working simultaneously for a wide range of different latent space models, such as distance models, which allow latent vectors to affect edge formation in flexible ways. These fitting methods, especially the one based on projected gradient descent, are fast and scalable to large networks. We obtain their rates of convergence for both inner-product models and beyond. The effectiveness of the modeling approach and fitting algorithms is demonstrated on five real world network datasets for different statistical tasks, including community detection with and without edge covariates, and network assisted learning.
Community Detection in Degree-Corrected Block Models
Gao, Chao, Ma, Zongming, Zhang, Anderson Y., Zhou, Harrison H.
Community detection is a central problem of network data analysis. Given a network, the goal of community detection is to partition the network nodes into a small number of clusters, which could often help reveal interesting structures. The present paper studies community detection in Degree-Corrected Block Models (DCBMs). We first derive asymptotic minimax risks of the problem for a misclassification proportion loss under appropriate conditions. The minimax risks are shown to depend on degree-correction parameters, community sizes, and average within and between community connectivities in an intuitive and interpretable way. In addition, we propose a polynomial time algorithm to adaptively perform consistent and even asymptotically optimal community detection in DCBMs.
Optimal Estimation and Completion of Matrices with Biclustering Structures
Gao, Chao, Lu, Yu, Ma, Zongming, Zhou, Harrison H.
Biclustering structures in data matrices were first formalized in a seminal paper by John Hartigan (1972) where one seeks to cluster cases and variables simultaneously. Such structures are also prevalent in block modeling of networks. In this paper, we develop a unified theory for the estimation and completion of matrices with biclustering structures, where the data is a partially observed and noise contaminated data matrix with a certain biclustering structure. In particular, we show that a constrained least squares estimator achieves minimax rate-optimal performance in several of the most important scenarios. To this end, we derive unified high probability upper bounds for all sub-Gaussian data and also provide matching minimax lower bounds in both Gaussian and binary cases. Due to the close connection of graphon to stochastic block models, an immediate consequence of our general results is a minimax rate-optimal estimator for sparse graphons.
Kernel Additive Principal Components
Tan, Xin Lu, Buja, Andreas, Ma, Zongming
Additive principal components (APCs for short) are a nonlinear generalization of linear principal components. We focus on smallest APCs to describe additive nonlinear constraints that are approximately satisfied by the data. Thus APCs fit data with implicit equations that treat the variables symmetrically, as opposed to regression analyses which fit data with explicit equations that treat the data asymmetrically by singling out a response variable. We propose a regularized data-analytic procedure for APC estimation using kernel methods. In contrast to existing approaches to APCs that are based on regularization through subspace restriction, kernel methods achieve regularization through shrinkage and therefore grant distinctive flexibility in APC estimation by allowing the use of infinite-dimensional functions spaces for searching APC transformation while retaining computational feasibility. To connect population APCs and kernelized finite-sample APCs, we study kernelized population APCs and their associated eigenproblems, which eventually lead to the establishment of consistency of the estimated APCs. Lastly, we discuss an iterative algorithm for computing kernelized finite-sample APCs.
Achieving Optimal Misclassification Proportion in Stochastic Block Model
Gao, Chao, Ma, Zongming, Zhang, Anderson Y., Zhou, Harrison H.
Community detection is a fundamental statistical problem in network data analysis. Many algorithms have been proposed to tackle this problem. Most of these algorithms are not guaranteed to achieve the statistical optimality of the problem, while procedures that achieve information theoretic limits for general parameter spaces are not computationally tractable. In this paper, we present a computationally feasible two-stage method that achieves optimal statistical performance in misclassification proportion for stochastic block model under weak regularity conditions. Our two-stage procedure consists of a generic refinement step that can take a wide range of weakly consistent community detection procedures as initializer, to which the refinement stage applies and outputs a community assignment achieving optimal misclassification proportion with high probability. The practical effectiveness of the new algorithm is demonstrated by competitive numerical results.
Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow
Cai, T. Tony, Li, Xiaodong, Ma, Zongming
This paper considers the noisy sparse phase retrieval problem: recovering a sparse signal $x \in \mathbb{R}^p$ from noisy quadratic measurements $y_j = (a_j' x )^2 + \epsilon_j$, $j=1, \ldots, m$, with independent sub-exponential noise $\epsilon_j$. The goals are to understand the effect of the sparsity of $x$ on the estimation precision and to construct a computationally feasible estimator to achieve the optimal rates. Inspired by the Wirtinger Flow [12] proposed for noiseless and non-sparse phase retrieval, a novel thresholded gradient descent algorithm is proposed and it is shown to adaptively achieve the minimax optimal rates of convergence over a wide range of sparsity levels when the $a_j$'s are independent standard Gaussian random vectors, provided that the sample size is sufficiently large compared to the sparsity of $x$.