trace estimation
- North America > United States > Minnesota (0.05)
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Optimal Sketching for Trace Estimation
Matrix trace estimation is ubiquitous in machine learning applications and has traditionally relied on Hutchinson's method, which requires $O(\log(1/\delta)/\epsilon^2)$ matrix-vector product queries to achieve a $(1 \pm \epsilon)$-multiplicative approximation to $\text{trace}(A)$ with failure probability $\delta$ on positive-semidefinite input matrices $A$. Recently, the Hutch++ algorithm was proposed, which reduces the number of matrix-vector queries from $O(1/\epsilon^2)$ to the optimal $O(1/\epsilon)$, and the algorithm succeeds with constant probability. However, in the high probability setting, the non-adaptive Hutch++ algorithm suffers an extra $O(\sqrt{\log(1/\delta)})$ multiplicative factor in its query complexity. Non-adaptive methods are important, as they correspond to sketching algorithms, which are mergeable, highly parallelizable, and provide low-memory streaming algorithms as well as low-communication distributed protocols. In this work, we close the gap between non-adaptive and adaptive algorithms, showing that even non-adaptive algorithms can achieve $O(\sqrt{\log(1/\delta)}/\epsilon + \log(1/\delta))$ matrix-vector products. In addition, we prove matching lower bounds demonstrating that, up to a $\log \log(1/\delta)$ factor, no further improvement in the dependence on $\delta$ or $\epsilon$ is possible by any non-adaptive algorithm. Finally, our experiments demonstrate the superior performance of our sketch over the adaptive Hutch++ algorithm, which is less parallelizable, as well as over the non-adaptive Hutchinson's method.
Fast Neural Tangent Kernel Alignment, Norm and Effective Rank via Trace Estimation
The Neural Tangent Kernel (NTK) characterizes how a model's state evolves over Gradient Descent. Computing the full NTK matrix is often infeasible, especially for recurrent architectures. Here, we introduce a matrix-free perspective, using trace estimation to rapidly analyze the empirical, finite-width NTK. This enables fast computation of the NTK's trace, Frobenius norm, effective rank, and alignment. We provide numerical recipes based on the Hutch++ trace estimator with provably fast convergence guarantees. In addition, we show that, due to the structure of the NTK, one can compute the trace using only forward- or reverse-mode automatic differentiation, not requiring both modes. We show these so-called one-sided estimators can outperform Hutch++ in the low-sample regime, especially when the gap between the model state and parameter count is large. In total, our results demonstrate that matrix-free randomized approaches can yield speedups of many orders of magnitude, leading to faster analysis and applications of the NTK.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Minnesota (0.04)
- North America > United States > Texas > Brazos County > College Station (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.52)
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)