co 2
A Fourier perspective on the learning dynamics of neural networks: from sample complexities to mechanistic insights
Ricci, Fabiola, Merger, Claudia, Goldt, Sebastian
Neural networks trained with gradient-based methods exhibit a strong simplicity bias: they learn simpler statistical features of their data before moving to more complex features. Previous analyses of this phenomenon have largely focused on settings with (quasi-)isotropic inputs. In this work, we study the simplicity bias from a Fourier perspective, which allows us to include two key features of natural images in the analysis: approximate translation-invariance and power-law spectra. We first show experimentally that simple neural networks trained on image classification tasks first rely on amplitude information -- related to pair-wise correlations between pixels -- before exploiting phase information, which encodes edges and higher-order correlations. In view of this, we introduce a synthetic data model for translation-invariant inputs that allows precise control over amplitudes and phases while remaining tractable. We rigorously establish that for isotropic and high-dimensional inputs, classification based on phase information alone is a genuinely hard task: online stochastic gradient descent (SGD) cannot distinguish the structured inputs from noise within $n \ll N^3$ steps, but needs at least $n \gg N^3 \log^2{N}$ steps. In contrast, we show both experimentally and theoretically that power-law spectra can dramatically accelerate the speed of learning phase information, even if the spectra do not help with classification. Simulations with two-layer networks trained on textures and with deep convolutional networks on ImageNet and CIFAR100 confirm this non-trivial interaction between amplitudes and phases, providing mechanistic insights into how deep neural networks can learn natural image distributions efficiently.
SU(2) = R(ฮธ, ฮธ, ฯ) = tkje P0 tkje T0 gkjt 0 ejฯWkjt 0 ejฮธL ฮธ! jฮธgsin 2 ฯcos 2 ฯej 2 0 = e cos
A.1 Mach-Zehnder Interferometers (MZIs) A basic coherent optical component used in this work is an MZI. One of the most general MZI structures is shown in Figure 15, consisting of two 50-by-50 optical directional couplers and four phase shifters ฮธ, ฮธ, ฯ, and ฯ. An MZI can achieve arbitrary 2 2 unitary matrices SU(2). Figure 15: 2-by-2 MZI with top (T), left (L), upper (P), and lower (W) phase shifters. A.2 MZI-based Photonic Tensor Core Architecture By cascading N(N 1)/2MZIs into a triangular mesh (Recks-style) or rectangular mesh (Clementsstyle), we can construct arbitrary N N unitary U(N).