McRae, Andrew D.
Synchronization on circles and spheres with nonlinear interactions
Criscitiello, Christopher, Rebjock, Quentin, McRae, Andrew D., Boumal, Nicolas
We consider the dynamics of $n$ points on a sphere in $\mathbb{R}^d$ ($d \geq 2$) which attract each other according to a function $\varphi$ of their inner products. When $\varphi$ is linear ($\varphi(t) = t$), the points converge to a common value (i.e., synchronize) in various connectivity scenarios: this is part of classical work on Kuramoto oscillator networks. When $\varphi$ is exponential ($\varphi(t) = e^{\beta t}$), these dynamics correspond to a limit of how idealized transformers process data, as described by Geshkovski et al. (2024). Accordingly, they ask whether synchronization occurs for exponential $\varphi$. In the context of consensus for multi-agent control, Markdahl et al. (2018) show that for $d \geq 3$ (spheres), if the interaction graph is connected and $\varphi$ is increasing and convex, then the system synchronizes. What is the situation on circles ($d=2$)? First, we show that $\varphi$ being increasing and convex is no longer sufficient. Then we identify a new condition (that the Taylor coefficients of $\varphi'$ are decreasing) under which we do have synchronization on the circle. In so doing, we provide some answers to the open problems posed by Geshkovski et al. (2024).
Perceptual adjustment queries and an inverted measurement paradigm for low-rank metric learning
Xu, Austin, McRae, Andrew D., Wang, Jingyan, Davenport, Mark A., Pananjady, Ashwin
We introduce a new type of query mechanism for collecting human feedback, called the perceptual adjustment query ( PAQ). Being both informative and cognitively lightweight, the PAQ adopts an inverted measurement scheme, and combines advantages from both cardinal and ordinal queries. We showcase the PAQ in the metric learning problem, where we collect PAQ measurements to learn an unknown Mahalanobis distance. This gives rise to a high-dimensional, low-rank matrix estimation problem to which standard matrix estimators cannot be applied. Consequently, we develop a two-stage estimator for metric learning from PAQs, and provide sample complexity guarantees for this estimator. We present numerical simulations demonstrating the performance of the estimator and its notable properties.
New Equivalences Between Interpolation and SVMs: Kernels and Structured Features
Kaushik, Chiraag, McRae, Andrew D., Davenport, Mark A., Muthukumar, Vidya
Recent empirical and theoretical efforts in supervised machine learning have discovered a wide range of surprising phenomena that arise in the modern overparameterized regime (i.e., where the number of free parameters in the model is much larger than the number of training examples [13, 6]). For example, after it was observed that deep neural networks can perfectly fit noisy training data and still generalise well to new data (see, e.g., [35, 43]), several theoretical efforts have demonstrated that this "harmless interpolation" phenomenon can in fact occur even in the simpler settings of linear and kernel regression[8, 7, 5]. Aseparate, but equally surprising observation in this overparameterized regime is that training procedures that optimize different loss functions can still yield similar test performance. For example, the empirical studies of [36, 22, 26, 16] demonstrate that kernel machines and deep neural networks trained using the squared loss, which is traditionally reserved for regression problems with continuous labels, can result in comparable classification performance to those trained with the more popular cross-entropy loss. Motivated by these observations, recent work has sought to deepen theoretical understanding of the impact of the loss function in overparameterized classification tasks, starting with linear models.
Harmless interpolation in regression and classification with structured features
McRae, Andrew D., Karnik, Santhosh, Davenport, Mark A., Muthukumar, Vidya
Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data. Inspired by this empirical observation, recent work has sought to understand this phenomenon of benign overfitting or harmless interpolation in the much simpler linear model. Previous theoretical work critically assumes that either the data features are statistically independent or the input data is high-dimensional; this precludes general nonparametric settings with structured feature maps. In this paper, we present a general and flexible framework for upper bounding regression and classification risk in a reproducing kernel Hilbert space. A key contribution is that our framework describes precise sufficient conditions on the data Gram matrix under which harmless interpolation occurs. Our results recover prior independent-features results (with a much simpler analysis), but they furthermore show that harmless interpolation can occur in more general settings such as features that are a bounded orthonormal system. Furthermore, our results show an asymptotic separation between classification and regression performance in a manner that was previously only shown for Gaussian features.
Low-rank matrix completion and denoising under Poisson noise
McRae, Andrew D., Davenport, Mark A.
This paper considers the problem of estimating a low-rank matrix from the observation of all, or a subset, of its entries in the presence of Poisson noise. When we observe all the entries, this is a problem of matrix denoising; when we observe only a subset of the entries, this is a problem of matrix completion. In both cases, we exploit an assumption that the underlying matrix is low-rank. Specifically, we analyze several estimators, including a constrained nuclear-norm minimization program, nuclear-norm regularized least squares, and a nonconvex constrained low-rank optimization problem. We show that for all three estimators, with high probability, we have an upper error bound (in the Frobenius norm error metric) that depends on the matrix rank, the fraction of the elements observed, and maximal row and column sums of the true matrix. We furthermore show that the above results are minimax optimal (within a universal constant) in classes of matrices with low rank and bounded row and column sums. We also extend these results to handle the case of matrix multinomial denoising and completion.