Goto

Collaborating Authors

 Huleihel, Wasim


Detecting Arbitrary Planted Subgraphs in Random Graphs

arXiv.org Artificial Intelligence

The problems of detecting and recovering planted structures/subgraphs in Erd\H{o}s-R\'{e}nyi random graphs, have received significant attention over the past three decades, leading to many exciting results and mathematical techniques. However, prior work has largely focused on specific ad hoc planted structures and inferential settings, while a general theory has remained elusive. In this paper, we bridge this gap by investigating the detection of an \emph{arbitrary} planted subgraph $\Gamma = \Gamma_n$ in an Erd\H{o}s-R\'{e}nyi random graph $\mathcal{G}(n, q_n)$, where the edge probability within $\Gamma$ is $p_n$. We examine both the statistical and computational aspects of this problem and establish the following results. In the dense regime, where the edge probabilities $p_n$ and $q_n$ are fixed, we tightly characterize the information-theoretic and computational thresholds for detecting $\Gamma$, and provide conditions under which a computational-statistical gap arises. Most notably, these thresholds depend on $\Gamma$ only through its number of edges, maximum degree, and maximum subgraph density. Our lower and upper bounds are general and apply to any value of $p_n$ and $q_n$ as functions of $n$. Accordingly, we also analyze the sparse regime where $q_n = \Theta(n^{-\alpha})$ and $p_n-q_n =\Theta(q_n)$, with $\alpha\in[0,2]$, as well as the critical regime where $p_n=1-o(1)$ and $q_n = \Theta(n^{-\alpha})$, both of which have been widely studied, for specific choices of $\Gamma$. For these regimes, we show that our bounds are tight for all planted subgraphs investigated in the literature thus far\textemdash{}and many more. Finally, we identify conditions under which detection undergoes sharp phase transition, where the boundaries at which algorithms succeed or fail shift abruptly as a function of $q_n$.


AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning

arXiv.org Artificial Intelligence

Training and fine-tuning large language models (LLMs) come with challenges related to memory and computational requirements due to the increasing size of the model weights and the optimizer states. Various techniques have been developed to tackle these challenges, such as low-rank adaptation (LoRA), which involves introducing a parallel trainable low-rank matrix to the fixed pre-trained weights at each layer. However, these methods often fall short compared to the full-rank weight training approach, as they restrict the parameter search to a low-rank subspace. This limitation can disrupt training dynamics and require a full-rank warm start to mitigate the impact. In this paper, we introduce a new method inspired by a phenomenon we formally prove: as training progresses, the rank of the estimated layer gradients gradually decreases, and asymptotically approaches rank one. Leveraging this, our approach involves adaptively reducing the rank of the gradients during Adam optimization steps, using an efficient online-updating low-rank projections rule. We further present a randomized SVD scheme for efficiently finding the projection matrix. Our technique enables full-parameter fine-tuning with adaptive low-rank gradient updates, significantly reducing overall memory requirements during training compared to state-of-the-art methods while improving model performance in both pretraining and fine-tuning. Finally, we provide a convergence analysis of our method and demonstrate its merits for training and fine-tuning language and biological foundation models.


Testing Dependency of Weighted Random Graphs

arXiv.org Artificial Intelligence

Consider the following decision problem. We observe two weighted random graphs that are either generated independently at random or are edge-dependent due to some latent vertex correspondence or permutation. For this basic problem, two natural questions arise: the detection problem, which concerns whether the graphs exhibit dependence, and the recovery problem, which concerns identifying the latent correspondence between vertices. Here, we address the former question, specifically, we aim to understand under what conditions, in terms of the number of vertices and the generative distributions, one can distinguish between the two hypotheses and detect whether these graphs are dependent or not, say, with high probability? The fundamental question above was first introduced and analyzed in [1], where for Gaussian-weighted and dense Erdล‘s-Rรฉnyi random graphs on n vertices, sharp informationtheoretic thresholds were developed, revealing the exact barrier at which the asymptotic optimal detection error probability undergoes a phase transition from zero to one as n approaches infinity. For sparse Erdล‘s-Rรฉnyi random graphs this threshold was initially determined within a constant factor in the same paper.


Detection of Correlated Random Vectors

arXiv.org Artificial Intelligence

In this paper, we investigate the problem of deciding whether two standard normal random vectors $\mathsf{X}\in\mathbb{R}^{n}$ and $\mathsf{Y}\in\mathbb{R}^{n}$ are correlated or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these vectors are statistically independent, while under the alternative, $\mathsf{X}$ and a randomly and uniformly permuted version of $\mathsf{Y}$, are correlated with correlation $\rho$. We analyze the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of $n$ and $\rho$. To derive our information-theoretic lower bounds, we develop a novel technique for evaluating the second moment of the likelihood ratio using an orthogonal polynomials expansion, which among other things, reveals a surprising connection to integer partition functions. We also study a multi-dimensional generalization of the above setting, where rather than two vectors we observe two databases/matrices, and furthermore allow for partial correlations between these two.


Deep Learning-Aided Subspace-Based DOA Recovery for Sparse Arrays

arXiv.org Artificial Intelligence

Sparse arrays enable resolving more direction of arrivals (DoAs) than antenna elements using non-uniform arrays. This is typically achieved by reconstructing the covariance of a virtual large uniform linear array (ULA), which is then processed by subspace DoA estimators. However, these method assume that the signals are non-coherent and the array is calibrated; the latter often challenging to achieve in sparse arrays, where one cannot access the virtual array elements. In this work, we propose Sparse-SubspaceNet, which leverages deep learning to enable subspace-based DoA recovery from sparse miscallibrated arrays with coherent sources. Sparse- SubspaceNet utilizes a dedicated deep network to learn from data how to compute a surrogate virtual array covariance that is divisible into distinguishable subspaces. By doing so, we learn to cope with coherent sources and miscalibrated sparse arrays, while preserving the interpretability and the suitability of model-based subspace DoA estimators.


Testing Dependency of Unlabeled Databases

arXiv.org Artificial Intelligence

In this paper, we investigate the problem of deciding whether two random databases $\mathsf{X}\in\mathcal{X}^{n\times d}$ and $\mathsf{Y}\in\mathcal{Y}^{n\times d}$ are statistically dependent or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these two databases are statistically independent, while under the alternative, there exists an unknown row permutation $\sigma$, such that $\mathsf{X}$ and $\mathsf{Y}^\sigma$, a permuted version of $\mathsf{Y}$, are statistically dependent with some known joint distribution, but have the same marginal distributions as the null. We characterize the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of $n$, $d$, and some spectral properties of the generative distributions of the datasets. For example, we prove that if a certain function of the eigenvalues of the likelihood function and $d$, is below a certain threshold, as $d\to\infty$, then weak detection (performing slightly better than random guessing) is statistically impossible, no matter what the value of $n$ is. This mimics the performance of an efficient test that thresholds a centered version of the log-likelihood function of the observed matrices. We also analyze the case where $d$ is fixed, for which we derive strong (vanishing error) and weak detection lower and upper bounds.


Learning k-Level Sparse Neural Networks Using a New Generalized Weighted Group Sparse Envelope Regularization

arXiv.org Machine Learning

We propose an efficient method to learn both unstructured and structured sparse neural networks during training, utilizing a novel generalization of the sparse envelope function (SEF) used as a regularizer, termed {\itshape{weighted group sparse envelope function}} (WGSEF). The WGSEF acts as a neuron group selector, which is leveraged to induce structured sparsity. The method ensures a hardware-friendly structured sparsity of a deep neural network (DNN) to efficiently accelerate the DNN's evaluation. Notably, the method is adaptable, letting any hardware specify group definitions, such as filters, channels, filter shapes, layer depths, a single parameter (unstructured), etc. Owing to the WGSEF's properties, the proposed method allows to a pre-define sparsity level that would be achieved at the training convergence, while maintaining negligible network accuracy degradation or even improvement in the case of redundant parameters. We introduce an efficient technique to calculate the exact value of the WGSEF along with its proximal operator in a worst-case complexity of $O(n)$, where $n$ is the total number of group variables. In addition, we propose a proximal-gradient-based optimization method to train the model, that is, the non-convex minimization of the sum of the neural network loss and the WGSEF. Finally, we conduct an experiment and illustrate the efficiency of our proposed technique in terms of the completion ratio, accuracy, and inference latency.


Phase Transitions in the Detection of Correlated Databases

arXiv.org Artificial Intelligence

We study the problem of detecting the correlation between two Gaussian databases $\mathsf{X}\in\mathbb{R}^{n\times d}$ and $\mathsf{Y}^{n\times d}$, each composed of $n$ users with $d$ features. This problem is relevant in the analysis of social media, computational biology, etc. We formulate this as a hypothesis testing problem: under the null hypothesis, these two databases are statistically independent. Under the alternative, however, there exists an unknown permutation $\sigma$ over the set of $n$ users (or, row permutation), such that $\mathsf{X}$ is $\rho$-correlated with $\mathsf{Y}^\sigma$, a permuted version of $\mathsf{Y}$. We determine sharp thresholds at which optimal testing exhibits a phase transition, depending on the asymptotic regime of $n$ and $d$. Specifically, we prove that if $\rho^2d\to0$, as $d\to\infty$, then weak detection (performing slightly better than random guessing) is statistically impossible, irrespectively of the value of $n$. This compliments the performance of a simple test that thresholds the sum all entries of $\mathsf{X}^T\mathsf{Y}$. Furthermore, when $d$ is fixed, we prove that strong detection (vanishing error probability) is impossible for any $\rho<\rho^\star$, where $\rho^\star$ is an explicit function of $d$, while weak detection is again impossible as long as $\rho^2d\to0$. These results close significant gaps in current recent related studies.


Random Subgraph Detection Using Queries

arXiv.org Artificial Intelligence

The planted densest subgraph detection problem refers to the task of testing whether in a given (random) graph there is a subgraph that is unusually dense. Specifically, we observe an undirected and unweighted graph on $n$ nodes. Under the null hypothesis, the graph is a realization of an Erd\H{o}s-R\'{e}nyi graph with edge probability (or, density) $q$. Under the alternative, there is a subgraph on $k$ vertices with edge probability $p>q$. The statistical as well as the computational barriers of this problem are well-understood for a wide range of the edge parameters $p$ and $q$. In this paper, we consider a natural variant of the above problem, where one can only observe a small part of the graph using adaptive edge queries. For this model, we determine the number of queries necessary and sufficient for detecting the presence of the planted subgraph. Specifically, we show that any (possibly randomized) algorithm must make $\mathsf{Q} = \Omega(\frac{n^2}{k^2\chi^4(p||q)}\log^2n)$ adaptive queries (on expectation) to the adjacency matrix of the graph to detect the planted subgraph with probability more than $1/2$, where $\chi^2(p||q)$ is the Chi-Square distance. On the other hand, we devise a quasi-polynomial-time algorithm that detects the planted subgraph with high probability by making $\mathsf{Q} = O(\frac{n^2}{k^2\chi^4(p||q)}\log^2n)$ non-adaptive queries. We then propose a polynomial-time algorithm which is able to detect the planted subgraph using $\mathsf{Q} = O(\frac{n^3}{k^3\chi^2(p||q)}\log^3 n)$ queries. We conjecture that in the leftover regime, where $\frac{n^2}{k^2}\ll\mathsf{Q}\ll \frac{n^3}{k^3}$, no polynomial-time algorithms exist. Our results resolve two questions posed in \cite{racz2020finding}, where the special case of adaptive detection and recovery of a planted clique was considered.


Planted Bipartite Graph Detection

arXiv.org Artificial Intelligence

We consider the task of detecting a hidden bipartite subgraph in a given random graph. Specifically, under the null hypothesis, the graph is a realization of an Erd\H{o}s-R\'{e}nyi random graph over $n$ vertices with edge density $q$. Under the alternative, there exists a planted $k_{\mathsf{R}} \times k_{\mathsf{L}}$ bipartite subgraph with edge density $p>q$. We derive asymptotically tight upper and lower bounds for this detection problem in both the dense regime, where $q,p = \Theta\left(1\right)$, and the sparse regime where $q,p = \Theta\left(n^{-\alpha}\right), \alpha \in \left(0,2\right]$. Moreover, we consider a variant of the above problem, where one can only observe a relatively small part of the graph, by using at most $\mathsf{Q}$ edge queries. For this problem, we derive upper and lower bounds in both the dense and sparse regimes.