AITopics | Ding, Jingqiu

Collaborating Authors

Ding, Jingqiu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improved Robust Estimation for Erd\H{o}s-R\'enyi Graphs: The Sparse Regime and Optimal Breakdown Point

Chen, Hongjie, Ding, Jingqiu, Hua, Yiding, Tiegel, Stefan

arXiv.org Machine LearningMar-5-2025

We study the problem of robustly estimating the edge density of Erd\H{o}s-R\'enyi random graphs $G(n, d^\circ/n)$ when an adversary can arbitrarily add or remove edges incident to an $\eta$-fraction of the nodes. We develop the first polynomial-time algorithm for this problem that estimates $d^\circ$ up to an additive error $O([\sqrt{\log(n) / n} + \eta\sqrt{\log(1/\eta)} ] \cdot \sqrt{d^\circ} + \eta \log(1/\eta))$. Our error guarantee matches information-theoretic lower bounds up to factors of $\log(1/\eta)$. Moreover, our estimator works for all $d^\circ \geq \Omega(1)$ and achieves optimal breakdown point $\eta = 1/2$. Previous algorithms [AJK+22, CDHS24], including inefficient ones, incur significantly suboptimal errors. Furthermore, even admitting suboptimal error guarantees, only inefficient algorithms achieve optimal breakdown point. Our algorithm is based on the sum-of-squares (SoS) hierarchy. A key ingredient is to construct constant-degree SoS certificates for concentration of the number of edges incident to small sets in $G(n, d^\circ/n)$. Crucially, we show that these certificates also exist in the sparse regime, when $d^\circ = o(\log n)$, a regime in which the performance of previous algorithms was significantly suboptimal.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2503.03923

Country:

Europe (0.28)
North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.36)

Add feedback

Low degree conjecture implies sharp computational thresholds in stochastic block model

Ding, Jingqiu, Hua, Yiding, Slot, Lucas, Steurer, David

arXiv.org Artificial IntelligenceFeb-20-2025

We investigate implications of the (extended) low-degree conjecture (recently formalized in [MW23]) in the context of the symmetric stochastic block model. Assuming the conjecture holds, we establish that no polynomial-time algorithm can weakly recover community labels below the Kesten-Stigum (KS) threshold. In particular, we rule out polynomial-time estimators that, with constant probability, achieve correlation with the true communities that is significantly better than random. Whereas, above the KS threshold, polynomial-time algorithms are known to achieve constant correlation with the true communities with high probability[Mas14,AS15]. To our knowledge, we provide the first rigorous evidence for the sharp transition in recovery rate for polynomial-time algorithms at the KS threshold. Notably, under a stronger version of the low-degree conjecture, our lower bound remains valid even when the number of blocks diverges. Furthermore, our results provide evidence of a computational-to-statistical gap in learning the parameters of stochastic block models. In contrast to prior work, which either (i) rules out polynomial-time algorithms for hypothesis testing with 1-o(1) success probability [Hopkins18, BBK+21a] under the low-degree conjecture, or (ii) rules out low-degree polynomials for learning the edge connection probability matrix [LG23], our approach provides stronger lower bounds on the recovery and learning problem. Our proof combines low-degree lower bounds from [Hopkins18, BBK+21a] with graph splitting and cross-validation techniques. In order to rule out general recovery algorithms, we employ the correlation preserving projection method developed in [HS17].

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.15024

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.35)

Add feedback

Private Edge Density Estimation for Random Graphs: Optimal, Efficient and Robust

Chen, Hongjie, Ding, Jingqiu, Hua, Yiding, Steurer, David

arXiv.org Machine LearningJun-3-2024

We give the first polynomial-time, differentially node-private, and robust algorithm for estimating the edge density of Erd\H{o}s-R\'enyi random graphs and their generalization, inhomogeneous random graphs. We further prove information-theoretical lower bounds, showing that the error rate of our algorithm is optimal up to logarithmic factors. Previous algorithms incur either exponential running time or suboptimal error rates. Two key ingredients of our algorithm are (1) a new sum-of-squares algorithm for robust edge density estimation, and (2) the reduction from privacy to robustness based on sum-of-squares exponential mechanisms due to Hopkins et al. (STOC 2023).

algorithm, artificial intelligence, random graph, (16 more...)

arXiv.org Machine Learning

2405.16663

Country:

North America > United States (0.14)
Asia > Japan (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.61)

Add feedback

Computational-Statistical Gaps for Improper Learning in Sparse Linear Regression

Buhai, Rares-Darius, Ding, Jingqiu, Tiegel, Stefan

arXiv.org Machine LearningFeb-21-2024

We study computational-statistical gaps for improper learning in sparse linear regression. More specifically, given $n$ samples from a $k$-sparse linear model in dimension $d$, we ask what is the minimum sample complexity to efficiently (in time polynomial in $d$, $k$, and $n$) find a potentially dense estimate for the regression vector that achieves non-trivial prediction error on the $n$ samples. Information-theoretically this can be achieved using $\Theta(k \log (d/k))$ samples. Yet, despite its prominence in the literature, there is no polynomial-time algorithm known to achieve the same guarantees using less than $\Theta(d)$ samples without additional restrictions on the model. Similarly, existing hardness results are either restricted to the proper setting, in which the estimate must be sparse as well, or only apply to specific algorithms. We give evidence that efficient algorithms for this task require at least (roughly) $\Omega(k^2)$ samples. In particular, we show that an improper learning algorithm for sparse linear regression can be used to solve sparse PCA problems (with a negative spike) in their Wishart form, in regimes in which efficient algorithms are widely believed to require at least $\Omega(k^2)$ samples. We complement our reduction with low-degree and statistical query lower bounds for the sparse PCA problems from which we reduce. Our hardness results apply to the (correlated) random design setting in which the covariates are drawn i.i.d. from a mean-zero Gaussian distribution with unknown covariance.

artificial intelligence, computational-statistical gap, machine learning, (2 more...)

arXiv.org Machine Learning

2402.14103

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.80)

Add feedback

Reaching Kesten-Stigum Threshold in the Stochastic Block Model under Node Corruptions

Ding, Jingqiu, d'Orsi, Tommaso, Hua, Yiding, Steurer, David

arXiv.org Artificial IntelligenceMay-17-2023

We study robust community detection in the context of node-corrupted stochastic block model, where an adversary can arbitrarily modify all the edges incident to a fraction of the $n$ vertices. We present the first polynomial-time algorithm that achieves weak recovery at the Kesten-Stigum threshold even in the presence of a small constant fraction of corrupted nodes. Prior to this work, even state-of-the-art robust algorithms were known to break under such node corruption adversaries, when close to the Kesten-Stigum threshold. We further extend our techniques to the $Z_2$ synchronization problem, where our algorithm reaches the optimal recovery threshold in the presence of similar strong adversarial perturbations. The key ingredient of our algorithm is a novel identifiability proof that leverages the push-out effect of the Grothendieck norm of principal submatrices.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.10227

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

SQ Lower Bounds for Random Sparse Planted Vector Problem

Ding, Jingqiu, Hua, Yiding

arXiv.org Artificial IntelligenceJan-26-2023

Consider the setting where a $\rho$-sparse Rademacher vector is planted in a random $d$-dimensional subspace of $R^n$. A classical question is how to recover this planted vector given a random basis in this subspace. A recent result by [ZSWB21] showed that the Lattice basis reduction algorithm can recover the planted vector when $n\geq d+1$. Although the algorithm is not expected to tolerate inverse polynomial amount of noise, it is surprising because it was previously shown that recovery cannot be achieved by low degree polynomials when $n\ll \rho^2 d^{2}$ [MW21]. A natural question is whether we can derive an Statistical Query (SQ) lower bound matching the previous low degree lower bound in [MW21]. This will - imply that the SQ lower bound can be surpassed by lattice based algorithms; - predict the computational hardness when the planted vector is perturbed by inverse polynomial amount of noise. In this paper, we prove such an SQ lower bound. In particular, we show that super-polynomial number of VSTAT queries is needed to solve the easier statistical testing problem when $n\ll \rho^2 d^{2}$ and $\rho\gg \frac{1}{\sqrt{d}}$. The most notable technique we used to derive the SQ lower bound is the almost equivalence relationship between SQ lower bound and low degree lower bound [BBH+20, MW21].

artificial intelligence, exp, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2301.11124

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Estimating Rank-One Spikes from Heavy-Tailed Noise via Self-Avoiding Walks

Ding, Jingqiu, Hopkins, Samuel B., Steurer, David

arXiv.org Machine LearningAug-31-2020

We study symmetric spiked matrix models with respect to a general class of noise distributions. Given a rank-1 deformation of a random noise matrix, whose entries are independently distributed with zero mean and unit variance, the goal is to estimate the rank-1 part. For the case of Gaussian noise, the top eigenvector of the given matrix is a widely-studied estimator known to achieve optimal statistical guarantees, e.g., in the sense of the celebrated BBP phase transition. However, this estimator can fail completely for heavy-tailed noise. In this work, we exhibit an estimator that works for heavy-tailed noise up to the BBP threshold that is optimal even for Gaussian noise. We give a non-asymptotic analysis of our estimator which relies only on the variance of each entry remaining constant as the size of the matrix grows: higher moments may grow arbitrarily fast or even fail to exist. Previously, it was only known how to achieve these guarantees if higher-order moments of the noises are bounded by a constant independent of the size of the matrix. Our estimator can be evaluated in polynomial time by counting self-avoiding walks via a color -coding technique. Moreover, we extend our estimator to spiked tensor models and establish analogous results.

artificial intelligence, machine learning, vertex, (17 more...)

arXiv.org Machine Learning

2008.13735

Country: North America > United States (0.27)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback