AITopics

2502.05728

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots (0.53)

arXiv.org Machine LearningNov-5-2024

Community detection with the Bethe-Hessian

Stephan, Ludovic, Zhu, Yizhe

The Bethe-Hessian matrix, introduced by Saade, Krzakala, and Zdeborov\'a (2014), is a Hermitian matrix designed for applying spectral clustering algorithms to sparse networks. Rather than employing a non-symmetric and high-dimensional non-backtracking operator, a spectral method based on the Bethe-Hessian matrix is conjectured to also reach the Kesten-Stigum detection threshold in the sparse stochastic block model (SBM). We provide the first rigorous analysis of the Bethe-Hessian spectral method in the SBM under both the bounded expected degree and the growing degree regimes. Specifically, we demonstrate that: (i) When the expected degree $d\geq 2$, the number of negative outliers of the Bethe-Hessian matrix can consistently estimate the number of blocks above the Kesten-Stigum threshold, thus confirming a conjecture from Saade, Krzakala, and Zdeborov\'a (2014) for $d\geq 2$. (ii) For sufficiently large $d$, its eigenvectors can be used to achieve weak recovery. (iii) As $d\to\infty$, we establish the concentration of the locations of its negative outlier eigenvalues, and weak consistency can be achieved via a spectral method based on the Bethe-Hessian matrix.

artificial intelligence, eigenvalue, machine learning, (16 more...)

2411.02835

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

arXiv.org Artificial IntelligenceFeb-12-2024

Online Differentially Private Synthetic Data Generation

He, Yiyun, Vershynin, Roman, Zhu, Yizhe

We present a polynomial-time algorithm for online differentially private synthetic data generation. For a data stream within the hypercube $[0,1]^d$ and an infinite time horizon, we develop an online algorithm that generates a differentially private synthetic dataset at each time $t$. This algorithm achieves a near-optimal accuracy bound of $O(t^{-1/d}\log(t))$ for $d\geq 2$ and $O(t^{-1}\log^{4.5}(t))$ for $d=1$ in the 1-Wasserstein distance. This result generalizes the previous work on the continual release model for counting queries to include Lipschitz queries. Compared to the offline case, where the entire dataset is available at once, our approach requires only an extra polylog factor in the accuracy bound.

algorithm, artificial intelligence, machine learning, (18 more...)

2402.08012

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

arXiv.org Artificial IntelligenceAug-13-2023

Overparameterized random feature regression with nearly orthogonal data

Wang, Zhichao, Zhu, Yizhe

We investigate the properties of random feature ridge regression (RFRR) given by a two-layer neural network with random Gaussian initialization. We study the non-asymptotic behaviors of the RFRR with nearly orthogonal deterministic unit-length input data vectors in the overparameterized regime, where the width of the first layer is much larger than the sample size. Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR). This KRR is derived from an expected kernel generated by a nonlinear random feature map. We then approximate the performance of the KRR by a polynomial kernel matrix obtained from the Hermite polynomial expansion of the activation function, whose degree only depends on the orthogonality among different data points. This polynomial kernel determines the asymptotic behavior of the RFRR and the KRR. Our results hold for a wide variety of activation functions and input data sets that exhibit nearly orthogonal properties. Based on these approximations, we obtain a lower bound for the generalization error of the RFRR for a nonlinear student-teacher model.

artificial intelligence, generalization error, machine learning, (16 more...)

2211.06077

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceJun-9-2023

Spectral gap-based deterministic tensor completion

Harris, Kameron Decker, López, Oscar, Read, Angus, Zhu, Yizhe

Tensor completion is a core machine learning algorithm used in recommender systems and other domains with missing data. While the matrix case is well-understood, theoretical results for tensor problems are limited, particularly when the sampling patterns are deterministic. Here we bound the generalization error of the solutions of two tensor completion methods, Poisson loss and atomic norm minimization, providing tighter bounds in terms of the target tensor rank. If the ground-truth tensor is order $t$ with CP-rank $r$, the dependence on $r$ is improved from $r^{2(t-1)(t^2-t-1)}$ in arXiv:1910.10692 to $r^{2(t-1)(3t-5)}$. The error in our bounds is deterministically controlled by the spectral gap of the sampling sparsity pattern. We also prove several new properties for the atomic tensor norm, reducing the rank dependence from $r^{3t-3}$ in arXiv:1711.04965 to $r^{3t-5}$ under random sampling schemes. A limitation is that atomic norm minimization, while theoretically interesting, leads to inefficient algorithms. However, numerical experiments illustrate the dependence of the reconstruction error on the spectral gap for the practical max-quasinorm, ridge penalty, and Poisson loss minimization algorithms. This view through the spectral gap is a promising window for further study of tensor algorithms.

artificial intelligence, machine learning, tensor, (19 more...)

2306.06262

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Artificial IntelligenceMay-25-2023

Differentially private low-dimensional representation of high-dimensional data

He, Yiyun, Strohmer, Thomas, Vershynin, Roman, Zhu, Yizhe

Differentially private synthetic data provide a powerful mechanism to enable data analysis while protecting sensitive information about individuals. However, when the data lie in a high-dimensional space, the accuracy of the synthetic data suffers from the curse of dimensionality. In this paper, we propose a differentially private algorithm to generate low-dimensional synthetic data efficiently from a high-dimensional dataset with a utility guarantee with respect to the Wasserstein distance. A key step of our algorithm is a private principal component analysis (PCA) procedure with a near-optimal accuracy bound that circumvents the curse of dimensionality. Different from the standard perturbation analysis using the Davis-Kahan theorem, our analysis of private PCA works without assuming the spectral gap for the sample covariance matrix.

artificial intelligence, data mining, machine learning, (16 more...)

2305.17148

Country: North America > United States (0.68)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceMar-23-2023

Shifted Diffusion for Text-to-image Generation

Zhou, Yufan, Liu, Bingchen, Zhu, Yizhe, Yang, Xiao, Chen, Changyou, Xu, Jinhui

We present Corgi, a novel method for text-to-image generation. Corgi is based on our proposed shifted diffusion model, which achieves better image embedding generation from input text. Unlike the baseline diffusion model used in DALL-E 2, our method seamlessly encodes prior knowledge of the pre-trained CLIP model in its diffusion process by designing a new initialization distribution and a new transition step of the diffusion. Compared to the strong DALL-E 2 baseline, our method performs better in generating image embedding from the text in terms of both efficiency and effectiveness, resulting in better text-to-image generation. Extensive large-scale experiments are conducted and evaluated in terms of both quantitative measures and human evaluation, indicating a stronger generation ability of our method compared to existing ones. Furthermore, our model enables semi-supervised and language-free training for text-to-image generation, where only part or none of the images in the training dataset have an associated caption. Trained with only 1.7% of the images being captioned, our semi-supervised model obtains FID results comparable to DALL-E 2 on zero-shot text-to-image generation evaluated on MS-COCO. Corgi also achieves new state-of-the-art results across different datasets on downstream language-free text-to-image generation tasks, outperforming the previous method, Lafite, by a large margin.

artificial intelligence, diffusion model, machine learning, (16 more...)

2211.15388

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

arXiv.org Machine LearningNov-29-2022

Sparse random hypergraphs: Non-backtracking spectra and community detection

Stephan, Ludovic, Zhu, Yizhe

The stochastic block model (SBM), first introduced in [56], is a generative model for random graphs with a community structure. It serves as a useful benchmark for clustering algorithms on graph data. When the random graph generated by an SBM is sparse with bounded expected degrees, a phase transition has been observed around the so-called Kesten-Stigum threshold: in particular, above this threshold, a wealth of algorithms are known to achieve partial reconstruction [73, 6, 30, 57, 39]. Most relevant to this line of work are spectral algorithms that use the eigenvectors of a matrix associated with the graph G to perform the reconstruction. In the sparse case, examples include the self-avoiding [69], non-backtracking [38, 62, 23], graph powering [3] or distance [78] matrices. We refer interested readers to the survey [1] for more references, including a more in-depth discussion of the Kesten-Stigum threshold. As a generalization of graphs, hypergraphs are well-studied objects in combinatorics and theoretical computer science.

artificial intelligence, hypergraph, machine learning, (17 more...)

2203.07346

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

arXiv.org Machine LearningDec-22-2021

Partial recovery and weak consistency in the non-uniform hypergraph Stochastic Block Model

Dumitriu, Ioana, Wang, Haixiao, Zhu, Yizhe

We consider the community detection problem in sparse random hypergraphs under the non-uniform hypergraph stochastic block model (HSBM), a general model of random networks with community structure and higher-order interactions. When the random hypergraph has bounded expected degrees, we provide a spectral algorithm that outputs a partition with at least a $\gamma$ fraction of the vertices classified correctly, where $\gamma\in (0.5,1)$ depends on the signal-to-noise ratio (SNR) of the model. When the SNR grows slowly as the number of vertices goes to infinity, our algorithm achieves weak consistency, which improves the previous results in Ghoshdastidar and Dukkipati (2017) for non-uniform HSBMs. Our spectral algorithm consists of three major steps: (1) Hyperedge selection: select hyperedges of certain sizes to provide the maximal signal-to-noise ratio for the induced sub-hypergraph; (2) Spectral partition: construct a regularized adjacency matrix and obtain an approximate partition based on singular vectors; (3) Correction and merging: incorporate the hyperedge information from adjacency tensors to upgrade the error rate guarantee. The theoretical analysis of our algorithm relies on the concentration and regularization of the adjacency matrix for sparse non-uniform random hypergraphs, which can be of independent interest.

artificial intelligence, machine learning, probability, (17 more...)

2112.11671

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.88)

arXiv.org Machine LearningSep-20-2021

Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks

Wang, Zhichao, Zhu, Yizhe

Such limiting law is first proved for general centered sample covariance matrices with correlation and then specified for our neural network model. We also prove non-asymptotic concentrations of empirical CK and NTK around their limiting kernel in the spectral norm, and lower bounds on their smallest eigenvalues. As an application, we verify the random feature regression achieves the same asymptotic performance as its limiting kernel regression in ultra-width limit. The limiting training and test errors for random feature regression are calculated by corresponding kernel regression. We also provide a nonlinear Hanson-Wright inequality suitable for neural networks with random weights and Lipschitz activation functions.

artificial intelligence, matrix, neural network, (15 more...)

2109.09304

Country:

North America > United States > California > San Diego County (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)