Goto

Collaborating Authors

 suffice


Follow-the-Perturbed-Leader Nearly Achieves Best-of-Both-Worlds for the m-Set Semi-Bandit Problems

Neural Information Processing Systems

We consider a common case of the combinatorial semi-bandit problem, the m-set semi-bandit, where the learner exactly selects m arms from the total d arms. In the adversarial setting, the best regret bound, known to be O( nmd) for time horizon n, is achieved by the well-known Follow-the-Regularized-Leader (FTRL) policy. However, this requires to explicitly compute the arm-selection probabilities via optimizing problems at each time step and sample according to them. This problem can be avoided by the Follow-the-Perturbed-Leader (FTPL) policy, which simply pulls the m arms that rank among the m smallest (estimated) loss with random perturbation. In this paper, we show that FTPL with a Fréchet perturbation also enjoys the near optimal regret bound O( nm( p dlog(d) + m5/6)) in the adversarial setting and approaches best-of-both-world regret bounds, i.e., achieves a logarithmic regret for the stochastic setting. Moreover, our lower bounds show that the extra factors are unavoidable with our approach; any improvement would require a fundamentally different and more challenging method.




Structure-Aware Spectral Sparsification via Uniform Edge Sampling

Neural Information Processing Systems

Spectral clustering is a fundamental method for graph partitioning, but its reliance on eigenvector computation limits scalability to massive graphs. Classical sparsification methods preserve spectral properties by sampling edges proportionally to their effective resistances, but require expensive preprocessing to estimate these resistances. We study whether uniform edge sampling--a simple, structure-agnostic strategy--can suffice for spectral clustering. Our main result shows that for graphs admitting a well-separated $k$-clustering, characterized by a large structure ratio $\Upsilon(k) = \lambda_{k+1} / \rho_G(k)$, uniform sampling preserves the spectral subspace used for clustering. Specifically, we prove that uniformly sampling $O(\gamma^2 n \log n / \varepsilon^2)$ edges, where $\gamma$ is the Laplacian condition number, yields a sparsifier whose top $(n-k)$-dimensional eigenspace is approximately orthogonal to the cluster indicators.



Appendix

Neural Information Processing Systems

Our results heavily rely on the specific nature of the periodic activation function, so a natural question is to which extent our results can be extended beyond the single periodic neuron class. For lower bounds, a challenging but very interesting generalization would be to establish the cryptographic-hardness of learning certain family of GLMs whose activation function does not need to be periodic. A potentially easier route forward on this direction, would be to consider the Hermite decomposition of the activation function, similar to [A3], and establish lower bounds on the performance of low-degree methods [A23], of SGD [A3], or of local search methods methods [A15], for activation functions whose low-degree Hermite coefficients are exponentially small. For upper bounds, we believe that our proposed LLL-based algorithm may be extended beyond learning even periodic activation functions, such as the cosine activation, by appropriately post-processing the measurements, but leave this for future work. Furthermore, it would be interesting to better understand (empirically or analytically) the noise tolerance of our LLL-based algorithm for "low-frequency" activation functions, such as the absolute value underlying the phase retrieval problem which has "zero" frequency.





Supplementary Material

Neural Information Processing Systems

For a vector x 2 Rd and H [d], we denote vH to denote the vector that is equal to v on i 2 H, and zero otherwise. For a real-valued random variable X and m 2 N, we use kXkLm to denote (E|X|m)1/m. For a set S Rd and a function f, we also define the set function notation f(S) as {f(x)|x 2 S}. A.1 Finding a stable subset from a stable weighted subset For a set S on npoints, we define n, as the set of weights w 2 Rn such that wi 2 [0,1/((1)n]for all i 2 [n]and P i wi =1 . For a fixed vector µ 2 Rd that will be clear from context, a set of npoints S = {x1,...,x n}, and weights w 2 n, over S, we use w to denote P i wi(xi µ)(xi µ)>. The goal of this section is to show Proposition A.1, which states that if we have a weight w over S such that w (with respect to some vector µ) has bounded Xk norm proportional to 2 for some > 0, then there must exists some large subset S0 S that is stable with respect to µ and . Let S be a set of n points in Rd. Suppose that there exists a w 2 n, such that k wkXk B 2 for some vector µ. Then there exists a subset S0 S such that (i)|S0| (1 2)n and (ii) S0 is (,,k)-stable with respect to µ and, where = O( p B +1). Observe that k wkXk B 2 implies k w 2IkXk (B +1) 2 by the triangle inequality. In order to show Proposition A.1, we show Lemma A.2, which is a weakening of Proposition A.1 where we additionally assume that µw = P i wixi is close to µ, where µ is the vector we use to define w as well as the vector that we want to find a large sample subset S0 to be stable with respect to. To use Lemma A.2, we additionally show Proposition A.4, which states that k wkXk B 2 is enough to imply that µw is close to µ.