eigenspace
A Canonicalization Perspective on Invariant and Equivariant Learning George Ma
In many applications, we desire neural networks to exhibit invariance or equivari-ance to certain groups due to symmetries inherent in the data. Recently, frame-averaging methods emerged to be a unified framework for attaining symmetries efficiently by averaging over input-dependent subsets of the group, i.e., frames. What we currently lack is a principled understanding of the design of frames.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States (0.04)
Spectral Superposition: A Theory of Feature Geometry
Ivanov, Georgi, Oozeer, Narmeen, Raval, Shivam, Pejovic, Tasana, Upadhyay, Shriyash, Abdullah, Amir
Neural networks represent more features than they have dimensions via superposition, forcing features to share representational space. Current methods decompose activations into sparse linear features but discard geometric structure. We develop a theory for studying the geometric structre of features by analyzing the spectra (eigenvalues, eigenspaces, etc.) of weight derived matrices. In particular, we introduce the frame operator $F = WW^\top$, which gives us a spectral measure that describes how each feature allocates norm across eigenspaces. While previous tools could describe the pairwise interactions between features, spectral methods capture the global geometry (``how do all features interact?''). In toy models of superposition, we use this theory to prove that capacity saturation forces spectral localization: features collapse onto single eigenspaces, organize into tight frames, and admit discrete classification via association schemes, classifying all geometries from prior work (simplices, polygons, antiprisms). The spectral measure formalism applies to arbitrary weight matrices, enabling diagnosis of feature localization beyond toy settings. These results point toward a broader program: applying operator theory to interpretability.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore (0.04)
Physics-informed Gaussian Process Regression in Solving Eigenvalue Problem of Linear Operators
Applying Physics-Informed Gaussian Process Regression to the eigenvalue problem $(\mathcal{L}-λ)u = 0$ poses a fundamental challenge, where the null source term results in a trivial predictive mean and a degenerate marginal likelihood. Drawing inspiration from system identification, we construct a transfer function-type indicator for the unknown eigenvalue/eigenfunction using the physics-informed Gaussian Process posterior. We demonstrate that the posterior covariance is only non-trivial when $λ$ corresponds to an eigenvalue of the partial differential operator $\mathcal{L}$, reflecting the existence of a non-trivial eigenspace, and any sample from the posterior lies in the eigenspace of the linear operator. We demonstrate the effectiveness of the proposed approach through several numerical examples with both linear and non-linear eigenvalue problems.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
Summary
We would like to thank the entire review team for their efforts and insightful comments. DZPS18] ([DZPS18] refers to arXiv:1810.02054) approach zero (i.e., ImageNet dataset has 14 million images. For those applications, a non-diminishing convergence rate is more desirable. Response to the concern on fixed second layer . Specifically, the same assumption is made in [ADH+19] and [ZCZG18] (arXiv:1811.08888),