tensor block model
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- South America > Brazil (0.04)
- (11 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Multiway clustering via tensor block models
We consider the problem of identifying multiway block structure from a large noisy tensor. Such problems arise frequently in applications such as genomics, recommendation system, topic modeling, and sensor network localization. We propose a tensor block model, develop a unified least-square estimation, and obtain the theoretical accuracy guarantees for multiway clustering. The statistical convergence of the estimator is established, and we show that the associated clustering procedure achieves partition consistency. A sparse regularization is further developed for identifying important blocks with elevated means. The proposal handles a broad range of data types, including binary, continuous, and hybrid observations. Through simulation and application to two real datasets, we demonstrate the outperformance of our approach over previous methods.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- South America > Brazil (0.04)
- (11 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.88)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- South America > Brazil (0.04)
- North America > Cuba (0.04)
- (12 more...)
Multiway clustering via tensor block models
We consider the problem of identifying multiway block structure from a large noisy tensor. Such problems arise frequently in applications such as genomics, recommendation system, topic modeling, and sensor network localization. We propose a tensor block model, develop a unified least-square estimation, and obtain the theoretical accuracy guarantees for multiway clustering. The statistical convergence of the estimator is established, and we show that the associated clustering procedure achieves partition consistency. A sparse regularization is further developed for identifying important blocks with elevated means.
Heteroskedastic Tensor Clustering
Tensor clustering, which seeks to extract underlying cluster structures from noisy tensor observations, has gained increasing attention. One extensively studied model for tensor clustering is the tensor block model, which postulates the existence of clustering structures along each mode and has found broad applications in areas like multi-tissue gene expression analysis and multilayer network analysis. However, currently available computationally feasible methods for tensor clustering either are limited to handling i.i.d. sub-Gaussian noise or suffer from suboptimal statistical performance, which restrains their utility in applications that have to deal with heteroskedastic data and/or low signal-to-noise-ratio (SNR). To overcome these challenges, we propose a two-stage method, named $\mathsf{High\text{-}order~HeteroClustering}$ ($\mathsf{HHC}$), which starts by performing tensor subspace estimation via a novel spectral algorithm called $\mathsf{Thresholded~Deflated\text{-}HeteroPCA}$, followed by approximate $k$-means to obtain cluster nodes. Encouragingly, our algorithm provably achieves exact clustering as long as the SNR exceeds the computational limit (ignoring logarithmic factors); here, the SNR refers to the ratio of the pairwise disparity between nodes to the noise level, and the computational limit indicates the lowest SNR that enables exact clustering with polynomial runtime. Comprehensive simulation and real-data experiments suggest that our algorithm outperforms existing algorithms across various settings, delivering more reliable clustering performance.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- Asia > China (0.04)
- North America > Mexico (0.04)
- (6 more...)
- Health & Medicine > Therapeutic Area > Neurology (0.45)
- Health & Medicine > Health Care Technology (0.45)
Multiway Spherical Clustering via Degree-Corrected Tensor Block Models
We consider the problem of multiway clustering in the presence of unknown degree heterogeneity. Such data problems arise commonly in applications such as recommendation system, neuroimaging, community detection, and hypergraph partitions in social networks. The allowance of degree heterogeneity provides great flexibility in clustering models, but the extra complexity poses significant challenges in both statistics and computation. Here, we develop a degree-corrected tensor block model with estimation accuracy guarantees. We present the phase transition of clustering performance based on the notion of angle separability, and we characterize three signal-to-noise regimes corresponding to different statistical-computational behaviors. In particular, we demonstrate that an intrinsic statistical-to-computational gap emerges only for tensors of order three or greater. Further, we develop an efficient polynomial-time algorithm that provably achieves exact clustering under mild signal conditions. The efficacy of our procedure is demonstrated through two data applications, one on human brain connectome project, and another on Peru Legislation network dataset.
- South America > Peru (0.34)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Information Technology > Data Science (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Smooth tensor estimation with unknown permutations
Higher-order tensor datasets are rising ubiquitously in modern data science applications, for instance, recommendation systems (Baltrunas et al., 2011; Bi et al., 2018), social networks (Bickel and Chen, 2009), genomics (Hore et al., 2016), and neuroimaging (Zhou et al., 2013). Tensor provides effective representation of data structure that classical vector-and matrix-based methods fail to capture. One example is music recommendation system (Baltrunas et al., 2011) that records ratings of songs from users on various contexts. This three-way tensor of user song context allows us to investigate interactions of users and songs in a context-specific manner. Another example is network dataset that records the connections among a set of nodes. Pairwise interactions are often insufficient to capture the complex relationships, whereas multi-way interactions improve the understanding of networks in molecular system (Young et al., 2018) and social networks (Han et al., 2020). In both examples, higher-order tensors represent multi-way interactions in an efficient way. Tensor estimation problem cannot be solved without imposing structures. An appropriate reordering of tensor entries often provides effective representation of the hidden salient structure.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Media > Music (0.68)
- Information Technology > Services (0.54)
- Law > Criminal Law (0.47)
- (3 more...)
Exact Clustering in Tensor Block Model: Statistical Optimality and Computational Limit
Han, Rungang, Luo, Yuetian, Wang, Miaoyan, Zhang, Anru R.
High-order clustering aims to identify heterogeneous substructure in multiway dataset that arises commonly in neuroimaging, genomics, and social network studies. The non-convex and discontinuous nature of the problem poses significant challenges in both statistics and computation. In this paper, we propose a tensor block model and the computationally efficient methods, \emph{high-order Lloyd algorithm} (HLloyd) and \emph{high-order spectral clustering} (HSC), for high-order clustering in tensor block model. The convergence of the proposed procedure is established, and we show that our method achieves exact clustering under reasonable assumptions. We also give the complete characterization for the statistical-computational trade-off in high-order clustering based on three different signal-to-noise ratio regimes. Finally, we show the merits of the proposed procedures via extensive experiments on both synthetic and real datasets.
- Asia > China (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (2 more...)
- Transportation > Passenger (1.00)
- Transportation > Air (1.00)
- Consumer Products & Services > Travel (1.00)
- (4 more...)
Multiway clustering via tensor block models
We consider the problem of identifying multiway block structure from a large noisy tensor. Such problems arise frequently in applications such as genomics, recommendation system, topic modeling, and sensor network localization. We propose a tensor block model, develop a unified least-square estimation, and obtain the theoretical accuracy guarantees for multiway clustering. The statistical convergence of the estimator is established, and we show that the associated clustering procedure achieves partition consistency. A sparse regularization is further developed for identifying important blocks with elevated means.