Goto

Collaborating Authors

 mathbb


Improved Robust Estimation for Erdős-Rényi Graphs: The Sparse Regime and Optimal Breakdown Point

Neural Information Processing Systems

We study the problem of robustly estimating the edge density of Erdos Renyi random graphs $\mathbb{G}(n, d^\circ/n)$ when an adversary can arbitrarily add or remove edges incident to an $\eta$-fraction of the nodes.


Variance-Reduced Long-Term Rehearsal Learning with Quadratic Programming Reformulation

Neural Information Processing Systems

In machine learning, a critical class of decision-making problems involves *Avoiding Undesired Future* (AUF): given a predicted undesired outcome, how can one make decision about actions to prevent it? Recently, the *rehearsal learning* framework has been proposed to address AUF problem. While existing methods offer reliable decisions for single-round success, this paper considers long-term settings that involve coordinating multiple future outcomes, which is often required in real-world tasks. Specifically, we generalize the AUF objective to characterize a long-term decision target that incorporates cross-temporal relations among variables. As directly optimizing the *AUF probability* $\mathbb{P}_{\operatorname{AUF}}$ over this objective remains challenging, we derive an explicit expression for the objective and further propose a quadratic programming (QP) reformulation that transforms the intractable probabilistic AUF optimization into a tractable one. Under mild assumptions, we show that solutions to the QP reformulation are equivalent to those of the original AUF optimization, based on which we develop two novel rehearsal learning methods for long-term decision-making: (i) a *greedy* method that maximizes the single-round $\mathbb{P}_{\operatorname{AUF}}$ at each step, and (ii) a *far-sighted* method that accounts for future consequences in each decision, yielding a higher overall $\mathbb{P}_{\operatorname{AUF}}$ through an $L/(L+1)$ variance reduction in the AUF objective. We further establish an $\mathcal{O}(1/\sqrt{N})$ excess risk bound for decisions based on estimated parameters, ensuring reliable practical applicability with finite data.


A Single-Swap Local Search Algorithm for k-Means of Lines

Neural Information Processing Systems

Clustering is a fundamental problem that has been extensively studied over past few decades, with most research focusing on point-based clustering such as $k$-means, $k$-median, and $k$-center. However, numerous real-world applications, such as motion analysis, computer vision, and missing data analysis, require clustering over structured data, including lines, time series and affine subspaces (flats), where traditional point-based clustering algorithms often fall short. In this paper, we study the $k$-means of lines problem, where the input is a set $L$ of lines in $\mathbb{R}^d$, and the goal is to find $k$ centers $C$ in $\mathbb{R}^d$ such that the sum of squared distances from each line in $L$ to its nearest center in $C$ is minimized. The local search algorithm is a well-established strategy for point-based $k$-means clustering, known for its efficiency and provable approximation guarantees. However, extending local search algorithm to the $k$-means of lines problem is nontrivial, as the capture relation used in point-based clustering does not generalize to the line setting.


Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-Index Models

Neural Information Processing Systems

We study the complexity of learning real-valued Multi-Index Models (MIMs) under the Gaussian distribution. A $K$-MIM is a function $f:\mathbb{R}^d\to \mathbb{R}$ that depends only on the projection of its input onto a $K$-dimensional subspace. We give a general algorithm for PAC learning a broad class of MIMs with respect to the square loss, even in the presence of adversarial label noise. Moreover, we establish a nearly matching Statistical Query (SQ) lower bound, providing evidence that the complexity of our algorithm is qualitatively optimal as a function of the dimension. Specifically, we consider the class of bounded variation MIMs with the property that degree at most $m$ distinguishing moments exist with respect to projections onto any subspace. In the presence of adversarial label noise, the complexity of our learning algorithm is $d^{O(m)}2^{\mathrm{poly}(K/\epsilon)}$.


Learning single index models via harmonic decomposition

Neural Information Processing Systems

We study the problem of learning single-index models, where the label $y \in \mathbb{R}$ depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown one-dimensional projection $\langle \boldsymbol{w_*}, \boldsymbol{x} \rangle$. Prior work has shown that under Gaussian inputs, the statistical and computational complexity of recovering $\boldsymbol{w}_*$ is governed by the Hermite expansion of the link function. In this paper, we propose a new perspective: we argue that *spherical harmonics*---rather than *Hermite polynomials*---provide the natural basis for this problem, as they capture its intrinsic \textit{rotational symmetry}. Building on this insight, we characterize the complexity of learning single-index models under arbitrary spherically symmetric input distributions. We introduce two families of estimators---based on tensor-unfolding and online SGD---that respectively achieve either optimal sample complexity or optimal runtime, and argue that estimators achieving both may not exist in general. When specialized to Gaussian inputs, our theory not only recovers and clarifies existing results but also reveals new phenomena that had previously been overlooked.


Balancing Gradient and Hessian Queries in Non-Convex Optimization

Neural Information Processing Systems

We develop optimization methods which offer new trade-offs between the number of gradient and Hessian computations needed to compute the critical point of a non-convex function.


Emergence and scaling laws in SGD learning of shallow neural networks

Neural Information Processing Systems

We focus on the challenging extensive-width regime $P\gg 1$ and permit diverging condition number in the second-layer, covering as a special case the power-law scaling $a_p\asymp p^{-\beta}$ where $\beta\in\mathbb{R}_{\ge 0}$. We provide a precise analysis of SGD dynamics for the training of a student two-layer network to minimize the mean squared error (MSE) objective, and explicitly identify sharp transition times to recover each signal direction. In the power-law setting, we characterize scaling law exponents for the MSE loss with respect to the number of training samples and SGD steps, as well as the number of parameters in the student neural network. Our analysis entails that while the learning of individual teacher neurons exhibits abrupt transitions, the juxtaposition of $P\gg 1$ emergent learning curves at different timescales leads to a smooth scaling law in the cumulative objective.


A Black-Box Debiasing Framework for Conditional Sampling

Neural Information Processing Systems

Conditional sampling is a fundamental task in Bayesian statistics and generative modeling. Consider the problem of sampling from the posterior distribution $P\_{X|Y=y^\*}$ for some observation $y^\*$, where the likelihood $P\_{Y|X}$ is known, and we are given $n$ i.i.d.


The Cost of Compression: Tight Quadratic Black-Box Attacks on Sketches for \ell_2 Norm Estimation

Neural Information Processing Systems

Dimensionality reduction via linear sketching is a powerful and widely used technique, but it is known to be vulnerable to adversarial inputs. We study the \emph{black-box adversarial setting}, where a fixed, hidden sketching matrix $A \in \mathbb{R}^{k \times n}$ maps high-dimensional vectors $\boldsymbol{v} \in \mathbb{R}^n$ to lower-dimensional sketches $A\boldsymbol{v} \in \mathbb{R}^k$, and an adversary can query the system to obtain approximate $\ell_2$-norm estimates that are computed from the sketch. We present a \emph{universal, nonadaptive attack} that, using $\tilde{O}(k^2)$ queries, either causes a failure in norm estimation or constructs an adversarial input on which the optimal estimator for the query distribution (used by the attack) fails. The attack is completely agnostic to the sketching matrix and to the estimator--it applies to \emph{any} linear sketch and \emph{any} query responder, including those that are randomized, adaptive, or tailored to the query distribution. Our lower bound construction tightly matches the known upper bounds of $\tilde{\Omega}(k^2)$, achieved by specialized estimators for Johnson-Lindenstrauss transforms and AMS sketches. Beyond sketching, our results uncover structural parallels to adversarial attacks in image classification, highlighting fundamental vulnerabilities of compressed representations.