Goto

Collaborating Authors

 random geometric graph



Graph Attention Network for Node Regression on Random Geometric Graphs with Erdős--Rényi contamination

Laha, Somak, Liu, Suqi, Austern, Morgane

arXiv.org Machine Learning

Graph attention networks (GATs) are widely used and often appear robust to noise in node covariates and edges, yet rigorous statistical guarantees demonstrating a provable advantage of GATs over non-attention graph neural networks~(GNNs) are scarce. We partially address this gap for node regression with graph-based errors-in-variables models under simultaneous covariate and edge corruption: responses are generated from latent node-level covariates, but only noise-perturbed versions of the latent covariates are observed; and the sample graph is a random geometric graph created from the node covariates but contaminated by independent Erdős--Rényi edges. We propose and analyze a carefully designed, task-specific GAT that constructs denoised proxy features for regression. We prove that regressing the response variables on the proxies achieves lower error asymptotically in (a) estimating the regression coefficient compared to the ordinary least squares (OLS) estimator on the noisy node covariates, and (b) predicting the response for an unlabelled node compared to a vanilla graph convolutional network~(GCN) -- under mild growth conditions. Our analysis leverages high-dimensional geometric tail bounds and concentration for neighbourhood counts and sample covariances. We verify our theoretical findings through experiments on synthetically generated data. We also perform experiments on real-world graphs and demonstrate the effectiveness of the attention mechanism in several node regression tasks.


Latent distance estimation for random geometric graphs

Neural Information Processing Systems

Random geometric graphs are a popular choice for a latent points generative model for networks. Their definition is based on a sample of $n$ points $X_1,X_2,\cdots,X_n$ on the Euclidean sphere~$\mathbb{S}^{d-1}$ which represents the latent positions of nodes of the network. The connection probabilities between the nodes are determined by an unknown function (referred to as the ``link'' function) evaluated at the distance between the latent points. We introduce a spectral estimator of the pairwise distance between latent points and we prove that its rate of convergence is the same as the nonparametric estimation of a function on $\mathbb{S}^{d-1}$, up to a logarithmic factor. In addition, we provide an efficient spectral algorithm to compute this estimator without any knowledge on the nonparametric link function. As a byproduct, our method can also consistently estimate the dimension $d$ of the latent space.


Manifold Percolation: from generative model to Reinforce learning

Tong, Rui

arXiv.org Machine Learning

Generative modeling is typically framed as learning mapping rules, but from an observer's perspective without access to these rules, the task becomes disentangling the geometric support from the probability distribution. We propose that continuum percolation is uniquely suited to this support analysis, as the sampling process effectively projects high-dimensional density estimation onto a geometric counting problem on the support. In this work, we establish a rigorous correspondence between the topological phase transitions of random geometric graphs and the underlying data manifold in high-dimensional space. By analyzing the relationship between our proposed Percolation Shift metric and FID, we show that this metric captures structural pathologies, such as implicit mode collapse, where standard statistical metrics fail. Finally, we translate this topological phenomenon into a differentiable loss function that guides training. Experimental results confirm that this approach not only prevents manifold shrinkage but also fosters a form of synergistic improvement, where topological stability becomes a prerequisite for sustained high fidelity in both static generation and sequential decision making.


Discrete scalar curvature as a weighted sum of Ollivier-Ricci curvatures

Hickok, Abigail, Blumberg, Andrew J.

arXiv.org Machine Learning

We study the relationship between discrete analogues of Ricci and scalar curvature that are defined for point clouds and graphs. In the discrete setting, Ricci curvature is replaced by Ollivier-Ricci curvature. Scalar curvature can be computed as the trace of Ricci curvature for a Riemannian manifold; this motivates a new definition of a scalar version of Ollivier-Ricci curvature. We show that our definition converges to scalar curvature for nearest neighbor graphs obtained by sampling from a manifold. We also prove some new results about the convergence of Ollivier-Ricci curvature to Ricci curvature.




Identifying critical residues of a protein using meaningfully-thresholded Random Geometric Graphs

Zhang, Chuqiao, Dantu, Sarath Chandra, Mitra, Debarghya, Chakrabarty, Dalia

arXiv.org Machine Learning

Identification of critical residues of a protein is actively pursued, since such residues are essential for protein function. We present three ways of recognising critical residues of an example protein, the evolution of which is tracked via molecular dynamical simulations. Our methods are based on learning a Random Geometric Graph (RGG) variable, where the state variable of each of 156 residues, is attached to a node of this graph, with the RGG learnt using the matrix of correlations between state variables of each residue-pair. Given the categorical nature of the state variable, correlation between a residue pair is computed using Cramer's V. We advance an organic thresholding to learn an RGG, and compare results against extant thresholding techniques, when parametrising criticality as the nodal degree in the learnt RGG. Secondly, we develop a criticality measure by ranking the computed differences between the posterior probability of the full graph variable defined on all 156 residues, and that of the graph with all but one residue omitted. A third parametrisation of criticality informs on the dynamical variation of nodal degrees as the protein evolves during the simulation. Finally, we compare results obtained with the three distinct criticality parameters, against experimentally-ascertained critical residues.


Graph Max Shift: A Hill-Climbing Method for Graph Clustering

Arias-Castro, Ery, Coda, Elizabeth, Qiao, Wanli

arXiv.org Machine Learning

A hill-climbing algorithm is typically understood as an algorithm that makes'local' moves. In a sense, this class of procedures is the discrete analog of the class of gradient-based and higher-order methods in continuous optimization. Such algorithms have been proposed in the context of graph partitioning, sometimes as a refinement step, where the objective function is typically a notion of cut and local moves often take the form of swapping vertices in order to improve the value of the objective function. More specifically, consider an undirected graph consisting of n nodes, which we take to be [n]:= {1,..., n} without loss of generality, and adjacency matrix A = (a


Latent distance estimation for random geometric graphs

Neural Information Processing Systems

Random geometric graphs are a popular choice for a latent points generative model for networks. Their definition is based on a sample of n points X_1,X_2,\cdots,X_n on the Euclidean sphere \mathbb{S} {d-1} which represents the latent positions of nodes of the network. The connection probabilities between the nodes are determined by an unknown function (referred to as the link'' function) evaluated at the distance between the latent points. We introduce a spectral estimator of the pairwise distance between latent points and we prove that its rate of convergence is the same as the nonparametric estimation of a function on \mathbb{S} {d-1}, up to a logarithmic factor. In addition, we provide an efficient spectral algorithm to compute this estimator without any knowledge on the nonparametric link function.