poly
Node-private community estimation in stochastic block models: Tractable algorithms and lower bounds
Marchis, Laurentiu, D'souza, Ethan, Flídr, Tomáš, Loh, Po-Ling
We study the classical problem of community recovery in stochastic block models with a fixed number of communities, with a twist: We seek algorithms that are stable with respect to node-wise changes in the graph structure, formally defined as a differential privacy constraint. The algorithms we develop are based on spectral clustering, where we introduce privacy to the community recovery pipeline in the form of directly privatizing the adjacency matrix; private PCA; private convex optimization; private low-rank matrix estimation; and private approximate subspace estimation. Straightforward applications of existing private algorithms lead to a rapid increase in the privacy parameter $ε$ in order to ensure consistent estimation under node differential privacy, in contrast with the simpler setting of edge privacy. To alleviate these issues, we develop novel algorithms based on (1) sampling from an exponential mechanism with a Lipschitz extension and (2) a general framework for constructing smooth projections from the space of undirected graphs to the space of bounded-degree graphs, which can then be combined with various edge-private algorithms. Importantly, the methods we develop are all computable in polynomial-time as a function of the number of nodes in the graph. We also develop novel lower bounds on the growth rate of $ε$ required in order to achieve consistent community estimation under node privacy. On a technical note, our paper highlights the complications that arise when analyzing private algorithms under the non-standard scaling $ε\rightarrow \infty$ and proposes some solutions. We also provide a novel application of the HGR maximal correlation from information theory in the context of accuracy amplification in PAC learning, which may be of independent interest.
Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy
Understanding when neural networks can be learned efficiently is a fundamental question in learning theory. Existing hardness results suggest that assumptions on both the input distribution and the network's weights are necessary for obtaining efficient algorithms. Moreover, it was previously shown that depth-2 networks can be efficiently learned under the assumptions that the input distribution is Gaussian, and the weight matrix is non-degenerate. In this work, we study whether such assumptions may suffice for learning deeper networks and prove negative results. We show that learning depth-3 ReLU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework, where a random noise is added to the network's parameters. It implies that learning depth-3 ReLU networks under the Gaussian distribution is hard even if the weight matrices are non-degenerate. Moreover, we consider depth-2networks, and show hardness of learning in the smoothed-analysis framework, where both the network parameters and the input distribution are smoothed. Our hardness results are under a wellstudied assumption on the existence of local pseudorandom generators.
in Fixed Dimension Training Neural Networks is NP-Hard
Our results settle the complexity status regarding these parameters number of dimensions and number of ReLUs if the network is assumed to compute the ReLU case, we show fixed-parameter tractability for the combined parameter four ReLUs (or two linear threshold neurons) with zero training error. Finally, in We also answer a question by Froese et al. [2022, JAIR] proving W[1]-hardness for dimensions, which excludes any polynomial-time algorithm for constant dimension. Khalife and Basu [2022, IPCO] showing that both problems are NP-hard for two eral questions are still open. We answer questions by Arora et al. [2018, ICLR] and complexity of these problems has been studied numerous times in recent years, sevsidering ReLU and linear threshold activation functions.
Learning to Think from Multiple Thinkers
Joshi, Nirmit, Magen, Roey, Srebro, Nathan, Tsilivis, Nikolaos, Vardi, Gal
We study learning with Chain-of-Thought (CoT) supervision from multiple thinkers, all of whom provide correct but possibly systematically different solutions, e.g., step-by-step solutions to math problems written by different thinkers, or step-by-step execution traces of different programs solving the same problem. We consider classes that are computationally easy to learn using CoT supervision from a single thinker, but hard to learn with only end-result supervision, i.e., without CoT (Joshi et al. 2025). We establish that, under cryptographic assumptions, learning can be hard from CoT supervision provided by two or a few different thinkers, in passive data-collection settings. On the other hand, we provide a generic computationally efficient active learning algorithm that learns with a small amount of CoT data per thinker that is completely independent of the target accuracy $\varepsilon$, a moderate number of thinkers that scales as $\log \frac{1}{\varepsilon}\log \log \frac{1}{\varepsilon}$, and sufficient passive end-result data that scales as $\frac{1}{\varepsilon}\cdot poly\log\frac{1}{\varepsilon}$.
related work
Deterministic RLDeterministic system is often the starting case in the study of sample-efficient algorithms, where the issue of exploration and exploitation trade-off is more clearly revealed since both the transition kernel and reward function are deterministic. The seminal work [81] proposes a sample-efficient algorithm for Q-learning that works for a family of function classes. Recently, [21] studies the agnostic setting where the optimal Q-function can only be approximated by a function class with approximation error. The algorithm in [21] learns the optimal policy with the number of trajectories linear with the eluder dimension. Consider MDPM where the transition is deterministic. Assume the function class in Definition 3.1 satisfies Assumption 2.1 and Assumption 2.2. For any t (0,1), if d Ω(log(BW/λ))and n d poly(κ,k,λ,BW,Bϕ,H,log(d/t)), then with probability at least 1 tAlgorithm 1 returns the optimal policy π .
Forster Decomposition and Learning Halfspaces with Noise
AForster transform is an operation that turns a distribution into one with good anticoncentration properties. While a Forster transform does not always exist, we show that any distribution can be efficiently decomposed as a disjoint mixture of few distributions for which a Forster transform exists and can be computed efficiently. As the main application of this result, we obtain the first polynomial-time algorithm for distribution-independent PAC learning of halfspaces in the Massart noise model with strongly polynomial sample complexity, i.e., independent of the bit complexity of the examples. Previous algorithms for this learning problem incurred sample complexity scaling polynomially with the bit complexity, even though such a dependence is not information-theoretically necessary.
Supplementary proofs from Section 2
We begin with a simple lemma showing that the values of the levels are monotone: Lemma A.1. First, we note that the second part of the lemma holds by lines 15-16. Let zil and zih be the value of zland zhin Algorithm 2 on line 9 on window i. There are two cases, depending on whether an element e? was added to the solutions or not. Suppose no element e? was added to the solution. Then all the levels remain the same.