Goto

Collaborating Authors

 tuple


Power-Law Spectrum of the Random Feature Model

Paquette, Elliot, Xiao, Ke Liang, Zhu, Yizhe

arXiv.org Machine Learning

Scaling laws for neural networks, in which the loss decays as a power-law in the number of parameters, data, and compute, depend fundamentally on the spectral structure of the data covariance, with power-law eigenvalue decay appearing ubiquitously in vision and language tasks. A central question is whether this spectral structure is preserved or destroyed when data passes through the basic building block of a neural network: a random linear projection followed by a nonlinear activation. We study this question for the random feature model: given data $x \sim N(0,H)\in \mathbb{R}^v$ where $H$ has $α$-power-law spectrum ($λ_j(H ) \asymp j^{-α}$, $α> 1$), a Gaussian sketch matrix $W \in \mathbb{R}^{v\times d}$, and an entrywise monomial $f(y) = y^{p}$, we characterize the eigenvalues of the population random-feature covariance $\mathbb{E}_{x }[\frac{1}{d}f(W^\top x )^{\otimes 2}]$. We prove matching upper and lower bounds: for all $1 \leq j \leq c_1 d \log^{-(p+1)}(d)$, the $j$-th eigenvalue is of order $\left(\log^{p-1}(j+1)/j\right)^α$. For $ c_1 d \log^{-(p+1)}(d)\leq j\leq d$, the $j$-th eigenvalue is of order $j^{-α}$ up to a polylog factor. That is, the power-law exponent $α$ is inherited exactly from the input covariance, modified only by a logarithmic correction that depends on the monomial degree $p$. The proof combines a dyadic head-tail decomposition with Wick chaos expansions for higher-order monomials and random matrix concentration inequalities.



Title

Author

Neural Information Processing Systems

A common approach to create more expressive GNNs is to change the message passing function of MPNNs. If a GNN is more expressive than MPNNs by adapting the message passing function, we call this non-standard message passing . Examples of this are message passing variants that operate on subgraphs [Frasca et al., 2022, Bevilacqua


To Believe or Not to Believe Y our LLM: Iterative Prompting for Estimating Epistemic Uncertainty

Neural Information Processing Systems

We explore uncertainty quantification in large language models (LLMs), with the goal to identify when uncertainty in responses given a query is large. We simultaneously consider both epistemic and aleatoric uncertainties, where the former comes from the lack of knowledge about the ground truth (such as about facts or the language), and the latter comes from irreducible randomness (such as multiple possible answers).



Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss

Zhao Song, David Woodruff, Peilin Zhong

Neural Information Processing Systems

Nevertheless, we show that under certain minimal and realistic distributional settings, it is possible to obtain a (1+ null)-approximation with a nearly linear running time and poly (k/null) + O ( k log n) columns. Namely, we show that if the input matrix A has the form A = B + E, where B is an arbitrary rank-k matrix, and E is a matrix with i.i.d.