Generalization Through Growth: Hidden Dynamics Controls Depth Dependence

Sonoda, Sho, Hashimoto, Yuka, Ishikawa, Isao, Ikeda, Masahiro

arXiv.org Machine Learning 

Recent theory has reduced the depth dependence of generalization bounds from exponential to polynomial and even depth-independent rates, yet these results remain tied to specific architectures and Euclidean inputs. We present a unified framework for arbitrary \blue{pseudo-metric} spaces in which a depth-\(k\) network is the composition of continuous hidden maps \(f:\mathcal{X}\to \mathcal{X}\) and an output map \(h:\mathcal{X}\to \mathbb{R}\). The resulting bound $O(\sqrt{(α+ \log β(k))/n})$ isolates the sole depth contribution in \(β(k)\), the word-ball growth of the semigroup generated by the hidden layers. By Gromov's theorem polynomial (resp. exponential) growth corresponds to virtually nilpotent (resp. expanding) dynamics, revealing a geometric dichotomy behind existing $O(\sqrt{k})$ (sublinear depth) and $\tilde{O}(1)$ (depth-independent) rates. We further provide covering-number estimates showing that expanding dynamics yield an exponential parameter saving via compositional expressivity. Our results decouple specification from implementation, offering architecture-agnostic and dynamical-systems-aware guarantees applicable to modern deep-learning paradigms such as test-time inference and diffusion models.