Quality control in sublinear time: a case study via random graphs

Marcussen, Cassandra, Rubinfeld, Ronitt, Sudan, Madhu

arXiv.org Artificial Intelligence 

Many algorithms are designed to work well on average over inputs. When running such an algorithm on an arbitrary input, we must ask: Can we trust the algorithm on this input? We identify a new class of algorithmic problems addressing this, which we call "Quality Control Problems." These problems are specified by a (positive, real-valued) "quality function" $ρ$ and a distribution $D$ such that, with high probability, a sample drawn from $D$ is "high quality," meaning its $ρ$-value is near $1$. The goal is to accept inputs $x \sim D$ and reject potentially adversarially generated inputs $x$ with $ρ(x)$ far from $1$. The objective of quality control is thus weaker than either component problem: testing for "$ρ(x) \approx 1$" or testing if $x \sim D$, and offers the possibility of more efficient algorithms. In this work, we consider the sublinear version of the quality control problem, where $D \in Δ(\{0,1\}^N)$ and the goal is to solve the $(D ,ρ)$-quality problem with $o(N)$ queries and time. As a case study, we consider random graphs, i.e., $D = G_{n,p}$ (and $N = \binom{n}2$), and the $k$-clique count function $ρ_k := C_k(G)/\mathbb{E}_{G' \sim G_{n,p}}[C_k(G')]$, where $C_k(G)$ is the number of $k$-cliques in $G$. Testing if $G \sim G_{n,p}$ with one sample, let alone with sublinear query access to the sample, is of course impossible. Testing if $ρ_k(G)\approx 1$ requires $p^{-Ω(k^2)}$ samples. In contrast, we show that the quality control problem for $G_{n,p}$ (with $n \geq p^{-ck}$ for some constant $c$) with respect to $ρ_k$ can be tested with $p^{-O(k)}$ queries and time, showing quality control is provably superpolynomially more efficient in this setting. More generally, for a motif $H$ of maximum degree $Δ(H)$, the respective quality control problem can be solved with $p^{-O(Δ(H))}$ queries and running time.