suffice
Appendix
Our results heavily rely on the specific nature of the periodic activation function, so a natural question is to which extent our results can be extended beyond the single periodic neuron class. For lower bounds, a challenging but very interesting generalization would be to establish the cryptographic-hardness of learning certain family of GLMs whose activation function does not need to be periodic. A potentially easier route forward on this direction, would be to consider the Hermite decomposition of the activation function, similar to [A3], and establish lower bounds on the performance of low-degree methods [A23], of SGD [A3], or of local search methods methods [A15], for activation functions whose low-degree Hermite coefficients are exponentially small. For upper bounds, we believe that our proposed LLL-based algorithm may be extended beyond learning even periodic activation functions, such as the cosine activation, by appropriately post-processing the measurements, but leave this for future work. Furthermore, it would be interesting to better understand (empirically or analytically) the noise tolerance of our LLL-based algorithm for "low-frequency" activation functions, such as the absolute value underlying the phase retrieval problem which has "zero" frequency.
Supplementary Material
For a vector x 2 Rd and H [d], we denote vH to denote the vector that is equal to v on i 2 H, and zero otherwise. For a real-valued random variable X and m 2 N, we use kXkLm to denote (E|X|m)1/m. For a set S Rd and a function f, we also define the set function notation f(S) as {f(x)|x 2 S}. A.1 Finding a stable subset from a stable weighted subset For a set S on npoints, we define n, as the set of weights w 2 Rn such that wi 2 [0,1/((1)n]for all i 2 [n]and P i wi =1 . For a fixed vector µ 2 Rd that will be clear from context, a set of npoints S = {x1,...,x n}, and weights w 2 n, over S, we use w to denote P i wi(xi µ)(xi µ)>. The goal of this section is to show Proposition A.1, which states that if we have a weight w over S such that w (with respect to some vector µ) has bounded Xk norm proportional to 2 for some > 0, then there must exists some large subset S0 S that is stable with respect to µ and . Let S be a set of n points in Rd. Suppose that there exists a w 2 n, such that k wkXk B 2 for some vector µ. Then there exists a subset S0 S such that (i)|S0| (1 2)n and (ii) S0 is (,,k)-stable with respect to µ and, where = O( p B +1). Observe that k wkXk B 2 implies k w 2IkXk (B +1) 2 by the triangle inequality. In order to show Proposition A.1, we show Lemma A.2, which is a weakening of Proposition A.1 where we additionally assume that µw = P i wixi is close to µ, where µ is the vector we use to define w as well as the vector that we want to find a large sample subset S0 to be stable with respect to. To use Lemma A.2, we additionally show Proposition A.4, which states that k wkXk B 2 is enough to imply that µw is close to µ.
Detection of local geometry in random graphs: information-theoretic and computational limits
Bok, Jinho, Li, Shuangping, Yu, Sophie H.
We study the problem of detecting local geometry in random graphs. We introduce a model $\mathcal{G}(n, p, d, k)$, where a hidden community of average size $k$ has edges drawn as a random geometric graph on $\mathbb{S}^{d-1}$, while all remaining edges follow the Erdős--Rényi model $\mathcal{G}(n, p)$. The random geometric graph is generated by thresholding inner products of latent vectors on $\mathbb{S}^{d-1}$, with each edge having marginal probability equal to $p$. This implies that $\mathcal{G}(n, p, d, k)$ and $\mathcal{G}(n, p)$ are indistinguishable at the level of the marginals, and the signal lies entirely in the edge dependencies induced by the local geometry. We investigate both the information-theoretic and computational limits of detection. On the information-theoretic side, our upper bounds follow from three tests based on signed triangle counts: a global test, a scan test, and a constrained scan test; our lower bounds follow from two complementary methods: truncated second moment via Wishart--GOE comparison, and tensorization of KL divergence. These results together settle the detection threshold at $d = \widetildeΘ(k^2 \vee k^6/n^3)$ for fixed $p$, and extend the state-of-the-art bounds from the full model (i.e., $k = n$) for vanishing $p$. On the computational side, we identify a computational--statistical gap and provide evidence via the low-degree polynomial framework, as well as the suboptimality of signed cycle counts of length $\ell \geq 4$.
Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning
In this paper, we study the non-asymptotic sample complexity for the pure exploration problem in contextual bandits and tabular reinforcement learning (RL): identifying an $\epsilon$-optimal policy from a set of policies $\Pi$ with high probability. Existing work in bandits has shown that it is possible to identify the best policy by estimating only the *difference* between the behaviors of individual policies-which can have substantially lower variance than estimating the behavior of each policy directly--yet the best-known complexities in RL fail to take advantage of this, and instead estimate the behavior of each policy directly. Does it suffice to estimate only the differences in the behaviors of policies in RL? We answer this question positively for contextual bandits, but in the negative for tabular RL, showing a separation between contextual bandits and RL. However, inspired by this, we show that it *almost* suffices to estimate only the differences in RL: if we can estimate the behavior of a *single* reference policy, it suffices to only estimate how any other policy deviates from this reference policy. We develop an algorithm which instantiates this principle and obtains, to the best of our knowledge, the tightest known bound on the sample complexity of tabular RL.